unicode string modification failing

Question

I am trying to tokenize a string with unicode characters. I am having trouble removing unicode tokens from the front of the string. I have tried

 code = String(code[prefix.endIndex...])

and

  let range = code.index(code.startIndex, offsetBy:0)..<prefix.endIndex
  code.removeSubrange(range)

Non unicode tokens (in prefix) are removed correctly. For example with code = "a + b" and prefix = "a". Both statements return " + b". However with code = "← a + b" and prefix = "←" both of the above statements return code as;

 "\u{86}\u{90} a + b"

The goal is to remove the ← so the output should be;

 " a + b"

Answer 1

Use native unicode compatible version of character remover instead:

For removing and returning first element:

let justFirst = code.removeFirst()

For removing the first and return the remaining characters

let allButFirst = code.dropFirst()

Similarly for the last character:

let justFirst = code.removeLast()
let allButFirst = code.dropLast()

Also you can convert it to standard Array and then work with that if you are happy with arrays:

let array = code.map { $0 }

Answer 2

This is an easy way to tokenize the string. You can remove whatever characters you want from the resulting array tokens .

func testTokenization() {
    let input = "← a + b"
    var tokens: [String] = []
    for character in input {
        tokens.append(String(character))
    }
    print(tokens)
}

unicode string modification failing

Question

2 answers

solution1
0 2019-08-22 19:43:50

solution2
0 2019-08-22 19:44:12

unicode string modification failing

Question

2 answers

solution1 0 2019-08-22 19:43:50

solution2 0 2019-08-22 19:44:12

solution1
0 2019-08-22 19:43:50

solution2
0 2019-08-22 19:44:12