Skip to content

Strings and Text Processing

Swift’s String type is Unicode-compliant, supporting complex text processing with grapheme clusters, bidirectional text, and internationalization. This document covers string manipulation, Unicode, and advanced text processing.

Creating Strings

Example: Basic Strings:

swift
let simple = "Hello, Swift!"
let empty = String()
let fromLiteral = """
    Multi-line
    string
    """
print(simple, empty, fromLiteral)

Example: String Interpolation:

swift
let name = "Alice"
let greeting = "Hello, \(name.uppercased())!" // "Hello, ALICE!"

String Operations

  • Concatenation: Use +, +=, or append.
  • Length: Access count for grapheme clusters.
  • Access: Use indices, not integers, due to Unicode.

Example: Concatenation and Length:

swift
var message = "Hello"
message += ", World!" // "Hello, World!"
print(message.count) // 12

Example: Safe Access:

swift
let first = message[message.startIndex] // "H"
if let index = message.index(message.startIndex, offsetBy: 6, limitedBy: message.endIndex) {
    print(message[index]) // "W"
}

Unicode and Characters

String is a collection of Character (Unicode grapheme clusters), not raw code points.

Example: Unicode Handling:

swift
let emoji = "👩‍🚀🇺🇳"
for char in emoji {
    print(char) // 👩‍🚀, 🇺🇳
}
print(emoji.count) // 2 (grapheme clusters)
print(emoji.unicodeScalars.count) // 5 (code points)

Example: Combining Characters:

swift
let cafe = "café" // e + combining acute
let precomposed = "café".normalized(.NFC) // Single character
print(cafe.count, precomposed.count) // 4, 4

String Methods

  • Searching: contains, hasPrefix, hasSuffix.
  • Splitting: split, components(separatedBy.
  • Replacing: replacingOccurrences, ~replacing.

Example: Searching:

swift
print(message.hasPrefix("Hello")) // true
print(message.contains("World")) // true

Example: Splitting:

swift
let csv = "Alice,25,Engineer"
let fields = csv.split(separator: ",") // ["Alice", "25", "Engineer"]

Example: Replacing:

swift
let updated = message.replacingOccurrences(of: "World", with: "Swift")
print(updated) // "Hello, Swift!"

Regular Expressions

Swift 5.7+ supports regex literals and Regex type.

Example: Regex Matching:

swift
let text = "Contact: alice@example.com, bob@example.com"
let regex = /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z{2,}/]
let emails = text.matches(of: regex).map { String($0.output) }
print(emails) // ["alice@example.com", "bob@example.com"]

String Comparison

Compare strings with == (Unicode-aware) or use localized.

Example:

swift
let str1 = "café"
let str2 = "café"
print(str1 == str2) // true (normalized comparison)

let names = ["Zeina", "zéna"]
print(names.sorted { $0.localizedStandardCompare($1) < 0 }) // Case-insensitive sort

Attributed Strings

Use NSAttributedString for rich text (example:

swift
import Foundation

let attr = NSAttributedString(
    string: "Bold Text",
    attributes: [.font: UIFont.boldSystemFont(ofSize: 16)]
)

Performance Optimization

  • Avoid Repeated Access: Cache indices for iteration.
  • Use NSString: For legacy APIs or performance-critical interop.
  • Regex Caching: Reuse compiled regexes.

Example: Optimized Iteration:

swift
var indices: [String.Index] = []
var current = message.startIndex
while current < message.endIndex {
    indices.append(current)
    current = message.index(after: current)
}

Best Practices

  • Unicode Compliance: Always handle graphemes for user text.
  • Localization: Use NSLocalizedString for strings.
  • Regex for Parsing: Prefer regex over manual parsing.
  • Immutable Strings: Use let for thread safety.
  • Performance: Profile string operations for large texts.
  • Testing: Verify string handling with diverse inputs (e.g., emoji, RTL).

Troubleshooting

  • Index Errors: Use safe indexing methods.
  • Unicode Issues: Normalize strings for consistency.
  • Regex Failures: Validate regex syntax and test cases.
  • Performance Slowdowns: Profile with Instruments.
  • Localization Bugs: Test with different locales.

Example: Comprehensive String Processing

swift
struct TextProcessor {
    func extractHashtags(_ text: String) -> [String] {
        let regex = /#[a-zA-Z0-9_]+/
        return text.matches(of: regex).map { String($0.output) }
    }
    
    func normalize(_ text: String) -> String {
        return text.normalized(.NFC).lowercased()
    }
    
    func truncate(_ text: String, to length: Int) -> String {
        guard text.count > length else { return text }
        let index = text.index(text.startIndex, offsetBy: length, limitedBy: text.endIndex)!
        return String(text[..<index]) + "..."
    }
}

let processor = TextProcessor()
let post = "Hello 👋 Welcome to #SwiftCoding and #Programming! café"
print(processor.extractHashtags(post)) // ["#SwiftCoding", "#Programming"]
print(processor.normalize(post)) // "hello 👋 welcome to #swiftcoding and #programming! café"
print(processor.truncate(post, to: 20)) // "Hello 👋 Welcome to #S..."

Released under the MIT License.