I am expecting a user input string which I need to split into separate words. The user may input text delimited by commas or spaces.
So for instance the text may be:
hello world this is John
. or
hello world this is John
or even
hello world, this, is John
How can I efficiently parse that text into the following list?
['hello', 'world', 'this', 'is', 'John']
Thanks in advance.
Use the regular expression: r'[\\s,]+'
to split on 1 or more white-space characters ( \\s
) or commas ( ,
).
import re
s = 'hello world, this, is John'
print re.split(r'[\s,]+', s)
['hello', 'world', 'this', 'is', 'John']
Since you need to split based on spaces and other special characters, the best RegEx would be \\W+
. Quoting from Python re documentation
\\W
When the
LOCALE
andUNICODE
flags are not specified, matches any non-alphanumeric character; this is equivalent to the set[^a-zA-Z0-9_]
. WithLOCALE
, it will match any character not in the set [0-9_], and not defined as alphanumeric for the current locale. If UNICODE is set, this will match anything other than[0-9_]
plus characters classified as not alphanumeric in the Unicode character properties database.
For Example,
data = "hello world, this, is John"
import re
print re.split("\W+", data)
# ['hello', 'world', 'this', 'is', 'John']
Or, if you have the list of special characters by which the string has to be split, you can do
print re.split("[\s,]+", data)
This splits based on any whitespace character ( \\s
) and comma ( ,
).
>>> s = "hello world this is John"
>>> s.split()
['hello', 'world', 'this', 'is', 'John']
>>> s = "hello world, this, is John"
>>> s.split()
['hello', 'world,', 'this,', 'is', 'John']
The first one is correctly parsed by split with no arguments ;)
Then you can :
>>> s = "hello world, this, is John"
>>> def notcoma(ss) :
... if ss[-1] == ',' :
... return ss[:-1]
... else :
... return ss
...
>>> map(notcoma, s.split())
['hello', 'world', 'this', 'is', 'John']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.