使用字典审查文本字符串并用 X 替换单词。 Python

Question

我正在尝试制作一个简单的程序，该程序采用一串文本t和一个单词列表l并打印文本，但l 中的单词替换为与单词中的字母相对应的多个 X。

问题：我的代码还替换了与l 中的单词匹配的部分单词。 我怎样才能让它只针对整个单词？

def censor(t, l):

    for cenword in l:
        number_of_X = len(cenword)
        sensurliste = {cenword : ("x"*len(cenword))}

        for cenword, x in sensurliste.items():
            word = t.replace(cenword, x)
            t = word.replace(cenword, x)

    print (word)

Answer 1

另一种方法是使用正则表达式来获取所有单词：

import re

blacklist = ['ccc', 'eee']

def replace(match):
    word = match.group()
    if word.lower() in blacklist:
        return 'x' * len(word)
    else:
        return word

text = 'aaa bbb ccc. ddd eee xcccx.'

text = re.sub(r'\b\w*\b', replace, text, flags=re.I|re.U)
print(text)

这具有使用正则表达式识别的各种单词边界的优势。

Answer 2

首先，我相信你想让你的 for 循环处于同一级别，这样当一个完成时另一个开始。

其次，看起来您有额外的代码，但实际上并没有做任何事情。

例如， sensurliste将只拥有与“X”字符串配对的审查词。 因此不需要第一个 for 循环，因为在第二个 for 循环中就地创建“X”字符串是微不足道的。

然后，你说 word = t.replace(cenword,x) t=word.replace(cenword,x)

第二行什么都不做，因为word已经替换了所有 cenword 实例。 所以，这可以缩短为

t = t.replace(cenword,x);

最后，这就是您的问题所在，python 替换方法不关心单词边界。 因此，无论是否为完整单词，它都会替换 cenword 的所有实例。

您可以使用正则表达式来制作它，因此它只会找到完整单词的实例，但是，我只会使用更多类似的东西

def censort(t,l):
    words = t.split()                       #split the words into a list
    for i in range(len(words)):             #for each word in the text
        if words[i] in l:                       #if it needs to be censoredx
            words[i] = "X"*len(words[i])            #replace it with X's
    t=words.join()                          #rejoin the list into a string

Answer 3

您可以使用 RegExp（模块 re）进行替换，也可以将输入字符串拆分为您认为是“整个单词”的内容。

如果您将任何分隔的空格视为一个单词，则可以执行以下操作：

def censor(t, l):
    for cenword in l:
        number_of_X = len(cenword)
        sensurliste = {cenword : ("x"*len(cenword))}
    censored = []
    for word in t.split():
        append(sensurliste.get(word, word))
    return ' '.join(censurliste)

请注意，这不会保留原始间距。 此外，如果您的文本包含标点符号，这可能不会产生您认为应该的内容。 例如，如果t包含单词 'stupid!'，但列表只有 'stupid'，则不会被替换。

如果您想解决所有这些问题，您将需要执行标记化。 您可能还需要考虑大写单词。

Answer 4

这很容易理解和清洁

def censor(text, word):
       return text.replace(word, ("*"*len(word)))

Answer 5

我把它做得更紧凑一点：

def censor_string(text, banned_words, replacer):
    return "".join([x + " " if x.lower() not in banned_words else replacer*len(x) + " " for x in text.split(" ") ])

但是我遇到了诸如“？”之类的特殊符号的问题。 或昏迷。 如果我将运行以下功能：

censor_string("Today is a Wednesday!", ["is", "Wednesday"], "*")

我收到的是“今天**星期三！” 而不是“今天 ** a *********！”

任何死亡如何跳过，忽略字符串中的字母和数字以外的任何内容？

Answer 6

def censor_string(text, censorlst, replacer):

    word_list = text.split()
    for censor in censorlst:
        index = 0
            for word in word_list:
            if censor.lower() == word.lower():
                ch = len(censor) * replacer
                word_list[index] = ch
            elif censor.lower() == word[0:-1].lower():
                ch = len(censor) * replacer
                word_list[index] = ch+word[-1]
            index+=1

return " ".join(word_list)
censor_string('Today is a Wednesday!', ['Today', 'a'], '-')
censor_string('The cow jumped over the moon.', ['cow', 'over'], '*')
censor_string('Why did the chicken cross the road?', ['Did', 'chicken','road'], '*')

使用字典审查文本字符串并用 X 替换单词。 Python

问题描述

6 个解决方案

解决方案1
2 2013-05-21 17:40:35

解决方案2
1 已采纳 2013-05-21 17:21:51

解决方案3
0 2013-05-21 17:31:02

解决方案4
0 2014-08-13 17:47:29

解决方案5
0 2020-03-02 18:42:32

解决方案6
0 2021-05-15 19:10:59

使用字典审查文本字符串并用 X 替换单词。 Python

问题描述

6 个解决方案

解决方案1 2 2013-05-21 17:40:35

解决方案2 1 已采纳 2013-05-21 17:21:51

解决方案3 0 2013-05-21 17:31:02

解决方案4 0 2014-08-13 17:47:29

解决方案5 0 2020-03-02 18:42:32

解决方案6 0 2021-05-15 19:10:59

解决方案1
2 2013-05-21 17:40:35

解决方案2
1 已采纳 2013-05-21 17:21:51

解决方案3
0 2013-05-21 17:31:02

解决方案4
0 2014-08-13 17:47:29

解决方案5
0 2020-03-02 18:42:32

解决方案6
0 2021-05-15 19:10:59