简体   繁体   中英

unicode string modification failing

I am trying to tokenize a string with unicode characters. I am having trouble removing unicode tokens from the front of the string. I have tried

 code = String(code[prefix.endIndex...])

and

  let range = code.index(code.startIndex, offsetBy:0)..<prefix.endIndex
  code.removeSubrange(range)

Non unicode tokens (in prefix) are removed correctly. For example with code = "a + b" and prefix = "a". Both statements return " + b". However with code = "← a + b" and prefix = "←" both of the above statements return code as;

 "\u{86}\u{90} a + b"   

The goal is to remove the ← so the output should be;

 " a + b"

Use native unicode compatible version of character remover instead:

For removing and returning first element:

let justFirst = code.removeFirst()

For removing the first and return the remaining characters

let allButFirst = code.dropFirst()

Similarly for the last character:

let justFirst = code.removeLast()
let allButFirst = code.dropLast()

Also you can convert it to standard Array and then work with that if you are happy with arrays:

let array = code.map { $0 }

This is an easy way to tokenize the string. You can remove whatever characters you want from the resulting array tokens .

func testTokenization() {
    let input = "← a + b"
    var tokens: [String] = []
    for character in input {
        tokens.append(String(character))
    }
    print(tokens)
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM