简体   繁体   English

如何过滤字符串以仅包含字母?

[英]How do you filter a string to only contain letters?

How do I make a function where it will filter out all the non-letters from the string?我如何制作一个函数,它将过滤掉字符串中的所有非字母? For example, letters("jajk24me") will return back "jajkme" .例如, letters("jajk24me")将返回"jajkme" (It needs to be a for loop) and will string.isalpha() function help me with this? (它需要是一个 for 循环)并且string.isalpha()函数会帮助我解决这个问题吗?

My attempt:我的尝试:

def letters(input):
    valids = []
    for character in input:
        if character in letters:
            valids.append( character)
    return (valids)

If it needs to be in that for loop, and a regular expression won't do, then this small modification of your loop will work:如果它需要在那个 for 循环中,而正则表达式不起作用,那么对循环的这个小修改将起作用:

def letters(input):
    valids = []
    for character in input:
        if character.isalpha():
            valids.append(character)
    return ''.join(valids)

(The ''.join(valids) at the end takes all of the characters that you have collected in a list, and joins them together into a string. Your original function returned that list of characters instead) (最后的''.join(valids)获取您在列表中收集的所有字符,并将它们连接成一个字符串。您的原始函数返回该字符列表)

You can also filter out characters from a string:您还可以从字符串中过滤掉字符:

def letters(input):
    return ''.join(filter(str.isalpha, input))

or with a list comprehension:或使用列表理解:

def letters(input):
    return ''.join([c for c in input if c.isalpha()])

or you could use a regular expression, as others have suggested.或者您可以使用正则表达式,正如其他人所建议的那样。

import re
valids = re.sub(r"[^A-Za-z]+", '', my_string)

EDIT: If it needs to be a for loop, something like this should work:编辑:如果它需要是一个 for 循环,这样的事情应该可以工作:

output = ''
for character in input:
    if character.isalpha():
        output += character

See re.sub , for performance consider a re.compile to optimize the pattern once.请参阅re.sub ,为了性能考虑重新re.compile以优化模式一次。
Below you find a short version which matches all characters not in the range from A to Z and replaces them with the empty string.下面是一个简短的版本,它匹配不在AZ范围内的所有字符,并将它们替换为空字符串。 The re.I flag ignores the case, thus also lowercase ( az ) characters are replaced. re.I标志忽略大小写,因此小写 ( az ) 字符也被替换。

import re

def charFilter(myString)
    return re.sub('[^A-Z]+', '', myString, 0, re.I)

If you really need that loop there are many awnsers, explaining that specifically.如果你真的需要那个循环,有很多 awnsers,专门解释这一点。 However you might want to give a reason why you need a loop.但是,您可能想给出需要循环的原因。

If you want to operate on the number sequences and thats the reason for the loop consider replacing the replacement string parameter with a function like:如果您想对数字序列进行操作并且这就是循环的原因,请考虑使用如下函数替换替换字符串参数:

import re

def numberPrinter(matchString) {
     print(matchString)
     return ''
}

def charFilter(myString)
    return re.sub('[^A-Z]+', '', myString, 0, re.I)

The method string.isalpha() checks whether string consists of alphabetic characters only.方法 string.isalpha() 检查字符串是否仅由字母字符组成。 You can use it to check if any modification is needed.您可以使用它来检查是否需要进行任何修改。 As to the other part of the question, pst is just right.至于问题的另一部分, pst 是正确的。 You can read about regular expressions in the python doc: http://docs.python.org/library/re.html They might seem daunting but are really useful once you get the hang of them.您可以在 python 文档中阅读有关正则表达式的信息: http : //docs.python.org/library/re.html它们可能看起来令人生畏,但一旦掌握了它们就非常有用。

Of course you can use isalpha .当然,您可以使用isalpha Also, valids can be a string.此外, valids可以是字符串。

Here you go:干得好:

def letters(input):
    valids = ""
    for character in input:
        if character.isalpha():
            valids += character
    return valids

Not using a for-loop.不使用 for 循环。 But that's already been thoroughly covered.但这已经被彻底覆盖了。

Might be a little late, and I'm not sure about performance, but I just thought of this solution which seems pretty nifty:可能有点晚了,我不确定性能,但我只是想到了这个看起来很漂亮的解决方案:

set(x).intersection(y)

You could use it like:你可以像这样使用它:

from string import ascii_letters

def letters(string):
    return ''.join(set(string).intersection(ascii_letters))

NOTE: This will not preserve linear order.注意:这不会保留线性顺序。 Which in my use case is fine, but be warned .这在我的使用情况是好的,但被警告

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM