How can I write a regex which will take each item in a list and return only words? I have taken text and split it on spaces but this is how it looks like in a list:
['#include', '', 'using', 'namespace', 'std;', 'int', 'main()', '{',
'int', 'divisor,', 'dividend,', 'quotient,', 'remainder;', 'cout', '<<',
'"Enter', 'dividend:', '";', 'cin', '>>', 'dividend;', 'cout', '<<',
'"Enter', 'divisor:', '";', 'cin', '>>', 'divisor;', 'quotient', '=',
'dividend', '/', 'divisor;', 'remainder', '=', 'dividend', '%',
'divisor;', 'cout', '<<', '"Quotient', '=', '"', '<<', 'quotient', '<<',
'endl;', 'cout', '<<', '"Remainder', '=', '"', '<<', 'remainder;',
'return', '0;']
I need to get out only words of it
You can achieve what you are doing without a regex:
context = 'text #include somefile.txt more here {} abc() finally'
words = [x for x in context.split() if x.isalpha()]
print(words) # => ['text', 'more', 'here', 'finally']
See the Python demo .
Alternatively, you may grab all the "words" you need using a single regex pass with re.findall
:
words = re.findall(r'(?<!\S)[a-zA-Z]+(?!\S)', context)
That way, you extract any 1+ ASCII letters (with [a-zA-Z]+
) that are preceded with a whitespace or start of string AND that are followed with whitespace or end of string.
See the regex demo .
Define a function is_word()
, filter all your elements through it. Use .isalpha()
inside your function. It's easy to strip unwanted char(s) with .isalpha()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.