Strip all non alphabetic characters from beginning of Python string w/o using RegEx

Question

I have various word list out of which many are noisy. By noisy I mean it begins with some non alphabetic characters like ' " ', ' - ' . eg: "thisword, -thisword , -"this word, .thisword and can be several others.

Like we can remove ascii by using

from string import ascii letter
string.lstrip(ascii_letters)

is there any similar method in python that can handle non_ascii without using regular expression?

Thanks!

Answer 1

Why dont you use the string.puctuation

>>> from string import punctuation
>>> "-asdf".lstrip(punctuation)
'asdf'
>>> "'asdf".lstrip(punctuation)
'asdf'
>>> '"asdf'.lstrip(punctuation)
'asdf'
>>> ',asdf'.lstrip(punctuation)
'asdf'

Answer 2

单词中仅保留字母

"".join([x for x in word if x.isalpha()])

Answer 3

using itertools.dropwhile :

>>> def removes(s):
...     return "".join(itertools.dropwhile(lambda x:not x.isalnum(),s))
... 
>>> removes("---thisword")
'thisword'
>>> removes("-^--thisword")
'thisword'
>>> removes("thisword")
'thisword'
>>> removes("...thisword")
'thisword'

Answer 4

Negate character set:

>>> from string import ascii_letters
>>> non_letter = ''.join(set(map(chr, range(128))) - set(ascii_letters))
>>> s = '-hello'
>>> s.lstrip(non_letter)
'hello'

Answer 5

I would suggest a while loop that trims each string until it hits an ascii. Load the non asciis into a list then search until you hit an ascii. Implement it as a function so that you can effectively abstract away the task.

Hope that helps.

Strip all non alphabetic characters from beginning of Python string w/o using RegEx

Question

5 answers

solution1
3 ACCPTED 2014-11-29 07:14:52

solution2
2 2014-11-29 07:13:10

solution3
2 2014-11-29 07:40:08

solution4
1 2014-11-29 07:14:00

solution5
0 2014-11-29 07:13:03

Strip all non alphabetic characters from beginning of Python string w/o using RegEx

Question

5 answers

solution1 3 ACCPTED 2014-11-29 07:14:52

solution2 2 2014-11-29 07:13:10

solution3 2 2014-11-29 07:40:08

solution4 1 2014-11-29 07:14:00

solution5 0 2014-11-29 07:13:03

solution1
3 ACCPTED 2014-11-29 07:14:52

solution2
2 2014-11-29 07:13:10

solution3
2 2014-11-29 07:40:08

solution4
1 2014-11-29 07:14:00

solution5
0 2014-11-29 07:13:03