When using the Python string function split(), does anybody have a nifty trick to treat items surrounded by double-quotes as a non-splitting word?
Say I want to split only on white space and I have this:
>>> myStr = 'A B\t"C" DE "FE"\t\t"GH I JK L" "" ""\t"O P Q" R'
>>> myStr.split()
['A', 'B', '"C"', 'DE', '"FE"', '"GH', 'I', 'JK', 'L"', '""', '""', '"O', 'P', 'Q"', 'R']
I'd like to treat anything within double-quotes as a single word, even if white spaces are embedded, so would like to end up with the below:
['A', 'B', 'C', 'DE', 'FE', 'GH I JK L', '', '', 'O P Q', 'R']
Or at least this and then I'll strip off the double-quotes:
['A', 'B', '"C"', 'DE', '"FE"', '"GH I JK L"', '""', '""', '"O P Q"', 'R']
Any non-regex suggestions?
You won't be able to get this behaviour with str.split()
. If you can live with the rather complex parsing it does (like ignoring double quotes preceded by a back slash), shlex.split()
might be what you are looking for:
>>> shlex.split(myStr)
['A', 'B', 'C', 'DE', 'FE', 'GH I JK L', '', '', 'O P Q', 'R']
@Rob: why without regexes if the regexp solution is so simple?
my_str = 'A B\t"C" DE "FE"\t\t"GH I JK L" "" ""\t"O P Q" R'
print re.findall(r'(\w+|".*?")', my_str)
['A', 'B', '"C"', 'DE', '"FE"', '"GH I JK L"', '""', '""', '"O P Q"', 'R']
我建议您使用re
搜索模式“[^”] *“并仅对其余部分应用string.split。您可以实现一个处理所有相关字符串部分的递归函数。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.