简体   繁体   中英

Removing a string that does not contain letters from a list of strings in python

I am making a text analyzer in python. I am trying to remove any string that does not contain any letters or integers from that list. I am stuck and do not know how to do so. Currently when counting the length of my list it is including the string '-' and I do not want it to because i don't want to count this as a word. However I'd rather not use string.remove('-') because I want it to work for other inputs.

Thanks in advance.

I think what you mean is you want to filter out strings with no alphanumeric characters from a list of strings. So ['a','b','*'] => ['a','b']

Not too hard:

In [39]: l = ['adsfg','sdfgb','gdc','56hjfg1','&#$%^',"asfgd3$#$%^" ]
In [40]: l = filter (lambda s:any([c.isalnum() for c in s]), l)
Out[41]:  ['adsfg', 'sdfgb', 'gdc', '56hjfg1', 'asfgd3$#$%^']

In [42]: 

If you want to keep the strings with alphanumeric chars in them but that also contain non-alphanumeric chars:

import re

strings = ["string", "&*()£", "$^TY?", "12345", "2wE4T", "@#~\!", "^(*4"]

strings = [s for s in strings if re.search(r'\w+', s)] #  \w matches alphanumeric chars

print strings
['string', '$^TY?', '12345', '2wE4T', '^(*4'] # now we can work with these wanted strings

Otherwise, to keep only the strings entirely populated by and only by alphanumeric chars:

str.isalnum() is your man:

strings = [s for s in strings if s.isalnum()]
print strings
['string', '12345', '2wE4T']

More on re module:

https://docs.python.org/2/howto/regex.html

http://www.regular-expressions.info/tutorial.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM