简体   繁体   中英

Tokenise line containing string literals

Using str.split on "print 'Hello, world!' times 3" "print 'Hello, world!' times 3" returns the list ["print", "'Hello,", "world!'", "times", "3"] . However, I want the result ["print", "'Hello, world!'", "times", "3"] . How can I do that?

If you're going to exclude the words in quote out of the split , you could use shlex.split :

import shlex

s = "print 'Hello, world!' times 3"
print(shlex.split(s))
# ['print', 'Hello, world!', 'times', '3']

This regex will capture the quotes, if you want them.

import re

s = "print 'hello, world!' 3 times"
re.findall(r'(\w+|\'.+\')',s)

.split() function splits the str based on the delimiter. The default delimiter is a blank space . It doesn't care about the ' within your string. In case you want to treat words within ' as a single word. You should be using shlex library or you may write regex expression. Surely, split() is not what you are looking for.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM