简体   繁体   English

在Python中从字符串中删除不需要的字符

[英]Removing unwanted characters from a string in Python

I have some strings that I want to delete some unwanted characters from them. 我有一些字符串,我想从中删除一些不需要的字符。 For example: Adam'sApple ----> AdamsApple .(case insensitive) Can someone help me, I need the fastest way to do it, cause I have a couple of millions of records that have to be polished. 例如: Adam'sApple ----> AdamsApple 。(不区分大小写)有人可以帮助我,我需要最快的方法来做到这一点,因为我有数百万条记录需要完善。 Thanks 谢谢

One simple way: 一个简单的方法:

>>> s = "Adam'sApple"
>>> x = s.replace("'", "")
>>> print x
'AdamsApple'

... or take a look at regex substitutions . ...或者看看正则表达式替换

Any characters in the 2nd argument of the translate method are deleted: 将删除translate方法的第二个参数中的任何字符:

>>> "Adam's Apple!".translate(None,"'!")
'Adams Apple'

NOTE: translate requires Python 2.6 or later to use None for the first argument, which otherwise must be a translation string of length 256. string.maketrans ('','') can be used in place of None for pre-2.6 versions. 注意:translate要求Python 2.6或更高版本对第一个参数使用None,否则必须是长度为256的翻译字符串。对于2.6之前的版本, string.maketrans ('','')可用于代替None。

Here is a function that removes all the irritating ascii characters, the only exception is "&" which is replaced with "and". 这是一个删除所有恼人的ascii字符的函数,唯一的例外是“&”,用“和”代替。 I use it to police a filesystem and ensure that all of the files adhere to the file naming scheme I insist everyone uses. 我用它来监管文件系统并确保所有文件都遵循我坚持每个人都使用的文件命名方案。

def cleanString(incomingString):
    newstring = incomingString
    newstring = newstring.replace("!","")
    newstring = newstring.replace("@","")
    newstring = newstring.replace("#","")
    newstring = newstring.replace("$","")
    newstring = newstring.replace("%","")
    newstring = newstring.replace("^","")
    newstring = newstring.replace("&","and")
    newstring = newstring.replace("*","")
    newstring = newstring.replace("(","")
    newstring = newstring.replace(")","")
    newstring = newstring.replace("+","")
    newstring = newstring.replace("=","")
    newstring = newstring.replace("?","")
    newstring = newstring.replace("\'","")
    newstring = newstring.replace("\"","")
    newstring = newstring.replace("{","")
    newstring = newstring.replace("}","")
    newstring = newstring.replace("[","")
    newstring = newstring.replace("]","")
    newstring = newstring.replace("<","")
    newstring = newstring.replace(">","")
    newstring = newstring.replace("~","")
    newstring = newstring.replace("`","")
    newstring = newstring.replace(":","")
    newstring = newstring.replace(";","")
    newstring = newstring.replace("|","")
    newstring = newstring.replace("\\","")
    newstring = newstring.replace("/","")        
    return newstring

Try: 尝试:

"Adam'sApple".replace("'", '')

One step further, to replace multiple characters with nothing: 更进一步,用什么都不替换多个字符:

import re
print re.sub(r'''['"x]''', '', '''a'"xb''')

Yields: 产量:

ab
str.replace("'","");

As has been pointed out several times now, you have to either use replace or regular expressions (most likely you don't need regexes though), but if you also have to make sure that the resulting string is plain ASCII (doesn't contain funky characters like é, ò, µ, æ or φ), you could finally do 正如现在已多次指出的那样,你必须使用replace或正则表达式(尽管你很可能不需要正则表达式),但是如果你还必须确保结果字符串是纯ASCII(不包含)像é,ò,μ,æ或φ这样的时髦字符,你终于可以做了

>>> u'(like é, ò, µ, æ or φ)'.encode('ascii', 'ignore')
'(like , , ,  or )'

An alternative that will take in a string and an array of unwanted chars 一种替代方案,它将接收一个字符串和一组不需要的字符

    # function that removes unwanted signs from str
    #Pass the string to the function and an array ofunwanted chars

def removeSigns(str,arrayOfChars):

    charFound = False

    newstr = ""

    for letter in str:
        for char in arrayOfChars:
            if letter == char:
                charFound = True
                break
        if charFound == False:
            newstr += letter
        charFound = False

    return newstr

Let's say we have the following list: 假设我们有以下列表:

states = [' Alabama ', 'Georgia!', 'Georgia', 'georgia', 'south carolina##', 'West virginia?']

Now we will define a function clean_strings() 现在我们将定义一个函数clean_strings()

import re

def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub('[!#?]', '', value)
        value = value.title()
        result.append(value)
    return result

When we call the function clean_strings(states) 当我们调用函数clean_strings(states)

The result will look like: 结果如下:

['Alabama',
'Georgia',
'Georgia',
'Georgia',
'Florida',
'South Carolina',
'West Virginia']

I am probably late for the answer but i think below code would also do ( to an extreme end) it will remove all the unncesary chars: 我可能迟到的答案,但我认为下面的代码也会做(到极端)它将删除所有不道德的字符:

a = '; niraj kale 984wywn on 2/2/2017'
a= re.sub('[^a-zA-Z0-9.?]',' ',a)
a = a.replace('  ',' ').lstrip().rstrip()

which will give 哪个会给

'niraj kale 984wywn on 2 2 2017' 'niraj kale 984wywn于2017年2月2日'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM