简体   繁体   中英

Python: How to split string but preserve the non-alphanumeric characters

I face problem when I am dealing with this:

Sample string - \"H\00E6tta\"

*\\00E6 is an unicode and my script able to understood it despite of not in usual form \æ. So please do not worry over that part.

I would expect after split something like:

['', '"H', "00E6tta", '"'] - first white column is normal as nothing before the '\' when I split

I did this:

sub_glyph = glyph.split("\\")

However this is the result I got:

['', 'H', '00E6tta', '']

Any clue? I would need the " to convert into unicode. But it just gone missing now. I am confused thought I split accordingly to '\\' and why the " will be gone. Can't find any resourceful guide online, need help.

Thanks

Use a raw string (prepending string with r makes it a raw string) and split it:

s = r'\"H\00E6tta\"'

print(s.split('\\'))
# ['', '"H', '00E6tta', '"']

Note : When we make s a raw string, the "literal" string (here) actually changes to \\\\"H\\\\00E6tta\\\\" (use repr(s) to view the change). This makes our split possible.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM