简体   繁体   English

file.replace('abcd')也替换为'abcde'如何仅替换精确值?

[英]file.replace('abcd') also replaces 'abcde' How do I only replace exact value?

def censor2(filename):
    infile = open(filename,'r')
    contents = infile.read()
    contentlist = contents.split()
    print (contents)
    print (contentlist)
    for letter in contentlist:
        if len(letter) == 4:
            print (letter)
            contents = contents.replace(letter,'xxxx')
    outfile = open('censor.txt','w')
    outfile.write(contents)
    infile.close()
    outfile.close()

This code works in Python. 此代码在Python中有效。 It accepts a file 'example.txt', reads it and loops through replacing all 4 letter words with the string 'xxxx' and outputting this into a new file (keeping original format!) called censored.txt. 它接受一个文件“ example.txt”,进行读取并循环通过将所有四个字母单词替换为字符串“ xxxx”并将其输出到一个新文件(保持原始格式!)中,该文件名为censored.txt。

I used the replace function and find the words to be replaced. 我使用了替换功能,并找到了要替换的单词。 However, the word 'abcd' is replaced and the next word 'abcde' is turned into 'xxxxe' 但是,单词“ abcd”被替换,下一个单词“ abcde”变成“ xxxxe”

How do i prevent 'abcde' from being changed? 如何防止更改“ abcde”?

I could not get the below examples to work, but after working with the re.sub module i found that the following code works to replace only 4 letter words and not 5 letter words. 我无法使用以下示例工作,但是在使用re.sub模块后,我发现以下代码仅可替换4个字母词,而不是5个字母词。

contents = re.sub(r"(\b)\w{4}(\b)", r"\1xxxxx\2", contents)

how about: 怎么样:

re.sub(r'\babcd\b','',my_text)

this will require it to have word boundaries on either side 这将要求它的两边都有单词边界

This is where regular expressions can be helpful. 这是正则表达式可以提供帮助的地方。 You would want something like this: 您会想要这样的东西:

import re
...
contents = re.sub(r'\babcd\b', 'xxxx', contents)
....

The \\b is the "word boundary" marker. \\b是“单词边界”标记。 It matches the change from a word to whitespace characters, punctuation, etc. 它匹配从单词到空白字符,标点符号等的变化。

You'll need the r'' style string for the regex pattern so that the backslashes are not treated as escape characters. 对于正则表达式模式,您将需要使用r''样式字符串,以便将反斜杠不视为转义字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM