How to find the byte position from a string in a string, not the character position?

Question

My texteditor (vim) can give the positions of a string in a string but counts the number of bytes, not the number of characters.

Example:

s="I don't take an apéritif après-ski"

When I search the word apéritif my texteditor gives the position:
16,25

Python gives this position of the same word:
16,24

Vim gives the possibility to execute python code in the editor.
In one of my python scripts I do a lot of slicing.
But I never find the correct word if there are accented characters in the string.
Is there a way to resolve this in python?
Can I find the byte position of a string in a string in python?

Answer 1

This is,admittedly, a naive solution. You can encode both the text and word to bytes, and then run find() operation on encoded text with encoded word as parameter.

def f(text,word):
    en_text=bytes(text,encoding="utf-8")
    en_word=bytes(word,encoding="utf-8")
    start = en_text.find(en_word)
    return (start,start+len(en_word))

When run as:

f("I don't take an apéritif après-ski","apéritif")

returns (16, 25)

How to find the byte position from a string in a string, not the character position?

Question

1 answers

solution1
2 ACCPTED 2017-01-26 13:52:45

How to find the byte position from a string in a string, not the character position?

Question

1 answers

solution1 2 ACCPTED 2017-01-26 13:52:45

solution1
2 ACCPTED 2017-01-26 13:52:45