简体   繁体   中英

python string to list - list comprehension

The input is a string and the output is a list, each cell contains the corresponding word. Word is defined to be a sequence of letters and/or numbers. For example, Ilove is a word, 45tgfd is a word, 54fss. isn't a word because it has . .

Let us assume that commas come only after a word.

For example - 'Donald John Trump, born June 14, 1946, is the 45th' should become ['Donald', 'John', 'Trump', 'born', 'June', '14', '1946', 'is', 'the', '45th']

Tried doing it with [x.rstrip(',') for x in line.split() if x.rstrip(',').isalpha() or x.rstrip(',').isdigit()] when line is the original string, however it became messy and wrong - couldn't detect '45th' because of isdigit and isalpha .

any idea?

You are looking for str.isalnum :

>>> [x for x in (s.rstrip(',') for s in line.split()) if x.isalnum()]
['Donald', 'John', 'Trump', 'born', 'June', '14', '1946', 'is', 'the', '45th']
>>>

Notice, too, I'm not redundantly calling rstrip by using a generator expression inside the comprehension, this also let's me do only single pass on line.split() .

>>> import re

>>> s = 'Donald John Trump, born June 14, 1946, is the 45th'
>>> [i.strip(',') for i in re.split(r'\s+',s) if not re.search(r'^[\.]|\w+\.\w+|[\.]$',i)]
['Donald', 'Trump', 'born', 'June', '14', '1946', 'is', 'the', '45th']

>>> s2 = 'tes.t .test test. another word'
>>> [i.strip(',') for i in re.split(r'\s+',s2) if not re.search(r'^[\.]|\w+\.\w+|[\.]$',i)]
['another', 'word']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM