简体   繁体   English

如何从字符串列表中删除特殊字符?

[英]How to remove special characters from a list of strings?

I am reading a file and using a regex on the file content to perform some operations.我正在读取文件并在文件内容上使用正则表达式来执行一些操作。 While reading the file, I don't find any special characters in it but after using the regex on the file content and saving it to a list, there are special characters like \\t and \\xa0 before numbers.在读取文件时,我没有在其中找到任何特殊字符,但是在文件内容上使用正则表达式并将其保存到列表后,数字前有特殊字符,如 \\t 和 \\xa0。

Example file content:示例文件内容:

Hydrochloric Acid to pHÂ 3.3-5.0        q.s.    q.s.    q.s.    pH-regulator    Ph Eur, NF

After applying regex becomes:应用正则表达式后变为:

Hydrochloric Acid to pHÂ\xa03.3-5.0\tq.s.\tq.s.\tq.s.\tpH-regulator\tPh Eur, NF

How do I remove all these without individual string replacement techniques?如何在没有单独的字符串替换技术的情况下删除所有这些?

Code:代码:

def extract(filename):
    file=open(filename)
    file=file.read()
    print(file)
    print("wefewwEF3RF3")
    result = []
    med = r"(?:{})".format("|".join(map(re.escape, medicines)))
    pattern = re.compile(r"^\s*" + med + r".*(?:\n[^\w\n]*\d*\.?\d+[^\w\n]*(?:\n.*){2})?", re.M|re.IGNORECASE)
    result = pattern.findall(file)
#    result.encode('ascii', 'ignore')
    newresult = []
    for line in result:
        newresult.append((line.strip()))
    return newresult

The newresult list contains all these special characters which are not present in the original file.newresult列表包含原始文件中不存在的所有这些特殊字符。

If you know all these special characters you can use maketrans and translate methods of str to replace them with spaces following way:如果您知道所有这些特殊字符,您可以使用 str 的maketranstranslate方法将它们替换为以下方式的空格:

txt = 'Hydrochloric Acid to pHÂ\xa03.3-5.0\tq.s.\tq.s.\tq.s.\tpH-regulator\tPh Eur, NF'
t = ''.maketrans('\xa0\t','  ')
newtxt = txt.translate(t)
print(newtxt)

Output输出

Hydrochloric Acid to pHÂ 3.3-5.0 q.s. q.s. q.s. pH-regulator Ph Eur, NF

maketrans accept 2 or 3 arguments. maketrans接受 2 或 3 个参数。 It creates translation table, which then might be used in translate method and work as follows: every char from first argument of maketrans is replace with corresponding char from second argument of maketrans (thus they must have equal length) and every character present in third argument of maketrans is removed.它创建转换表,然后可以在translate方法中使用它并按如下方式工作:来自maketrans第一个参数的每个字符maketrans被替换为来自maketrans第二个参数的相应字符(因此它们必须具有相等的长度)并且每个字符出现在第三个参数中的 maketrans 被删除。 In example above \\xa0 is replaced with space and \\t is replaced with space.在上面的例子中, \\xa0被替换为空格, \\t被替换为空格。

在此处输入图片说明

Hi,你好,

Can you check your code under a different Python version?你能在不同的 Python 版本下检查你的代码吗? It seems to work without error on 3.8.0.它似乎在 3.8.0 上没有错误。

def extract(filename):
    file='Hydrochloric Acid to pHÂ 3.3-5.0        q.s.    q.s.    q.s.    pH-regulator    Ph Eur, NF'
    result = []
    med = r"(?:{})".format("|".join(map(re.escape, file)))
    pattern = re.compile(r"^\s*" + med + r".*(?:\n[^\w\n]*\d*\.?\d+[^\w\n]*(?:\n.*){2})?", re.M|re.IGNORECASE)
    result = pattern.findall(file)
    #result.encode('ascii', 'ignore')
    newresult = []
    for line in result:
        newresult.append((line.strip()))
    print(file)
    print (newresult)
    return newresult
extract('test')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM