I am trying to tokenize a string with unicode characters. I am having trouble removing unicode tokens from the front of the string. I have tried
code = String(code[prefix.endIndex...])
and
let range = code.index(code.startIndex, offsetBy:0)..<prefix.endIndex
code.removeSubrange(range)
Non unicode tokens (in prefix) are removed correctly. For example with code = "a + b" and prefix = "a". Both statements return " + b". However with code = "← a + b" and prefix = "←" both of the above statements return code as;
"\u{86}\u{90} a + b"
The goal is to remove the ← so the output should be;
" a + b"
Use native unicode compatible version of character remover instead:
For removing and returning first element:
let justFirst = code.removeFirst()
For removing the first and return the remaining characters
let allButFirst = code.dropFirst()
Similarly for the last
character:
let justFirst = code.removeLast()
let allButFirst = code.dropLast()
Also you can convert it to standard Array
and then work with that if you are happy with arrays:
let array = code.map { $0 }
This is an easy way to tokenize the string. You can remove whatever characters you want from the resulting array tokens
.
func testTokenization() {
let input = "← a + b"
var tokens: [String] = []
for character in input {
tokens.append(String(character))
}
print(tokens)
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.