I am making a text analyzer in python. I am trying to remove any string that does not contain any letters or integers from that list. I am stuck and do not know how to do so. Currently when counting the length of my list it is including the string '-' and I do not want it to because i don't want to count this as a word. However I'd rather not use string.remove('-') because I want it to work for other inputs.
Thanks in advance.
I think what you mean is you want to filter out strings with no alphanumeric characters from a list of strings. So ['a','b','*'] => ['a','b']
Not too hard:
In [39]: l = ['adsfg','sdfgb','gdc','56hjfg1','&#$%^',"asfgd3$#$%^" ]
In [40]: l = filter (lambda s:any([c.isalnum() for c in s]), l)
Out[41]: ['adsfg', 'sdfgb', 'gdc', '56hjfg1', 'asfgd3$#$%^']
In [42]:
If you want to keep the strings with alphanumeric chars in them but that also contain non-alphanumeric chars:
import re
strings = ["string", "&*()£", "$^TY?", "12345", "2wE4T", "@#~\!", "^(*4"]
strings = [s for s in strings if re.search(r'\w+', s)] # \w matches alphanumeric chars
print strings
['string', '$^TY?', '12345', '2wE4T', '^(*4'] # now we can work with these wanted strings
Otherwise, to keep only the strings entirely populated by and only by alphanumeric chars:
str.isalnum()
is your man:
strings = [s for s in strings if s.isalnum()]
print strings
['string', '12345', '2wE4T']
More on re module:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.