简体   繁体   中英

How can i write a function which will take a list item and if it contains anything but for characters will remove them and leave only words

How can I write a regex which will take each item in a list and return only words? I have taken text and split it on spaces but this is how it looks like in a list:

['#include', '', 'using', 'namespace', 'std;', 'int', 'main()', '{',
'int', 'divisor,', 'dividend,', 'quotient,', 'remainder;', 'cout', '<<',
'"Enter', 'dividend:', '";', 'cin', '>>', 'dividend;', 'cout', '<<',
'"Enter', 'divisor:', '";', 'cin', '>>', 'divisor;', 'quotient', '=',
'dividend', '/', 'divisor;', 'remainder', '=', 'dividend', '%',
'divisor;', 'cout', '<<', '"Quotient', '=', '"', '<<', 'quotient', '<<',
'endl;', 'cout', '<<', '"Remainder', '=', '"', '<<', 'remainder;',
'return', '0;']

I need to get out only words of it

You can achieve what you are doing without a regex:

context = 'text #include somefile.txt more here {} abc() finally'
words = [x for x in context.split() if x.isalpha()]
print(words) # => ['text', 'more', 'here', 'finally']

See the Python demo .

Alternatively, you may grab all the "words" you need using a single regex pass with re.findall :

words = re.findall(r'(?<!\S)[a-zA-Z]+(?!\S)', context)

That way, you extract any 1+ ASCII letters (with [a-zA-Z]+ ) that are preceded with a whitespace or start of string AND that are followed with whitespace or end of string.

See the regex demo .

Define a function is_word() , filter all your elements through it. Use .isalpha() inside your function. It's easy to strip unwanted char(s) with .isalpha()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM