简体   繁体   中英

How to find the byte position from a string in a string, not the character position?

My texteditor (vim) can give the positions of a string in a string but counts the number of bytes, not the number of characters.

Example:

s="I don't take an apéritif après-ski"

When I search the word apéritif my texteditor gives the position:
16,25

Python gives this position of the same word:
16,24

Vim gives the possibility to execute python code in the editor.
In one of my python scripts I do a lot of slicing.
But I never find the correct word if there are accented characters in the string.
Is there a way to resolve this in python?
Can I find the byte position of a string in a string in python?

This is,admittedly, a naive solution. You can encode both the text and word to bytes, and then run find() operation on encoded text with encoded word as parameter.

def f(text,word):
    en_text=bytes(text,encoding="utf-8")
    en_word=bytes(word,encoding="utf-8")
    start = en_text.find(en_word)
    return (start,start+len(en_word))

When run as:

f("I don't take an apéritif après-ski","apéritif")

returns (16, 25)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM