繁体   English   中英

查找句子中的每三个单词,并仅将其字母替换为 # 符号

[英]Finding every third word in a sentence and replacing only its letters with the # symbol

这是我的代码:

def redact_words(sentence):
    redacted_sentence = sentence.split()
    for word in redacted_sentence[2::3]:
        for i in word:
            if i.isalpha():
                word.replace(i, '#')
            else:
                continue
    return " ".join(redacted_sentence)

如果输入是sentence = "You cannot drink the word 'water'." output 应该是"You cannot ##### the word '#####'."

我得到的 output 只是我输入的单词列表。

我们可以用一点 Python 魔法来完成这个:

def redact_words(sentence):
    redacted_sentence = []
    sentence = sentence.split()
    for pos, word in enumerate(sentence, start=1):
        if pos % 3 == 0:
            word = "".join("#" if letter.isalpha() else letter for letter in word)
        redacted_sentence.append(word)
    return " ".join(redacted_sentence)

首先,我们创建一个列表来包含新句子的单词。 接下来,在将句子拆分为列表后,我们使用enumerate生成句子中每个单词的位置,以及单词本身。 从 1 开始,我们可以使用模运算符来查看 position 是否可以被 3 整除。如果是,我们使用推导式将word中的所有字母字符替换为# ,不理会其他字符,然后重新分配结果word 最后,我们把 append 这个词放到redacted_sentence列表中,不管它是否被改变,并返回一个字符串,所有的词用空格连接在一起。

您有两个问题归结为同一个问题:创建和丢弃字符串的变异副本,同时保持原始字符串不变。 str.replace必须分配到有用的地方(通常在这种情况下重新分配word ),而且,要更新原始list ,您必须重新分配列表中的索引( wordlist list object 的单独别名,但是重新分配 word 只是重新绑定word并打破别名,它不会更改list的内容)。 所以解决方案是:

  1. 保留每次replace操作的结果
  2. 将最终结果放回相同位置的list

在保持相同基本设计的同时实现此结果的代码的最低限度修改是:

from itertools import count  # So we can track the index to perform replacement at

def redact_words(sentence):
    redacted_sentence = sentence.split()
    for i, word in zip(count(2, 3), redacted_sentence[2::3]):  # Track index and value
        for c in set(word):  # Change name to c; i is for indices, not characters/letters
                             # For minor efficiency gain, dedupe word so we don't replace same char over and over
            if c.isalpha():
                word = word.replace(c, '#')  # Reassign back to word so change not lost
        redacted_sentence[i] = word  # Replace original word in list with altered word
    return " ".join(redacted_sentence)

更快的解决方案是用单遍正则表达式替换或(如果只需要处理 ASCII) str.translate调用替换内部循环,用 O(n) 替换每个单词的O(n²) O(n) ,例如:

import re
from itertools import count  # So we can track the index to perform replacement at

# Precompile regex that matches only alphabetic characters and bind its sub method
# while we're at it    
replace_alpha = re.compile(r'[^\W\d_]').sub

def redact_words(sentence):
    redacted_sentence = sentence.split()
    for i, word in zip(count(2, 3), redacted_sentence[2::3]):  # Track index and value
        # Replace every alphabetic character with # in provided word and replace
        # list's contents at same index
        redacted_sentence[i] = replace_alpha('#', word)
    return " ".join(redacted_sentence)

这是您可以做到的一种方法。

def redact_words(sentence, red_acted_words=None):
    if red_acted_words is None:
        red_acted_words = sentence.split()[2::3]
    for rword in red_acted_words:
        j = "#" * len(rword)
        sentence = sentence.split(rword)
        sentence = j.join(sentence)
    return sentence
redact_words("You cannot drink the word 'water'.", red_acted_words=["water", "drink"])
redact_words("You cannot drink the word 'water'.")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM