I have this regex that works fine in http://regexpal.com/ :
[^-:1234567890/.,\s]*
I am trying to find in a paragraph full of ( , . # "" \\n \\s
...etc) just the words
but in my code i cannot see the result i am specting:
def words(lines):
words_pattern = re.compile(r'[^-:1234567890/.,\s]*')
li = []
for m in lines:
e = words_pattern.search(m)
if e:
match = e.group()
li.append(match)
return li
li = [u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'']
Any advice on this? maybe i am not traspassing the regex in the right way from one place to another
Thanks in advance
EDIT
To be more precise i do want: ñ á é í ó and ú
thanks
If you just want the letters you could use string.ascii_letters
>>> from string import ascii_letters
>>> import re
>>> s = 'this is 123 some text! that has someñ \n other stuff.'
>>> re.findall('[{}]+'.format(ascii_letters), s)
['this', 'is', 'some', 'text', 'that', 'has', 'some', 'other', 'stuff']
You can also get the same behavior from [A-Za-z]
(which is essentially the same thing as string.ascii_letters
)
>>> re.findall('[A-Za-z]+', s)
['this', 'is', 'some', 'text', 'that', 'has', 'some', 'other', 'stuff']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.