简体   繁体   中英

Strip all non alphabetic characters from beginning of Python string w/o using RegEx

I have various word list out of which many are noisy. By noisy I mean it begins with some non alphabetic characters like ' " ', ' - ' . eg: "thisword, -thisword , -"this word, .thisword and can be several others.

Like we can remove ascii by using

from string import ascii letter
string.lstrip(ascii_letters)  

is there any similar method in python that can handle non_ascii without using regular expression?

Thanks!

Why dont you use the string.puctuation

>>> from string import punctuation
>>> "-asdf".lstrip(punctuation)
'asdf'
>>> "'asdf".lstrip(punctuation)
'asdf'
>>> '"asdf'.lstrip(punctuation)
'asdf'
>>> ',asdf'.lstrip(punctuation)
'asdf'

单词中仅保留字母

"".join([x for x in word if x.isalpha()])

using itertools.dropwhile :

>>> def removes(s):
...     return "".join(itertools.dropwhile(lambda x:not x.isalnum(),s))
... 
>>> removes("---thisword")
'thisword'
>>> removes("-^--thisword")
'thisword'
>>> removes("thisword")
'thisword'
>>> removes("...thisword")
'thisword'

Negate character set:

>>> from string import ascii_letters
>>> non_letter = ''.join(set(map(chr, range(128))) - set(ascii_letters))
>>> s = '-hello'
>>> s.lstrip(non_letter)
'hello'

I would suggest a while loop that trims each string until it hits an ascii. Load the non asciis into a list then search until you hit an ascii. Implement it as a function so that you can effectively abstract away the task.

Hope that helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM