简体   繁体   English

查找句子中的每三个单词,并仅将其字母替换为 # 符号

[英]Finding every third word in a sentence and replacing only its letters with the # symbol

This is my code:这是我的代码:

def redact_words(sentence):
    redacted_sentence = sentence.split()
    for word in redacted_sentence[2::3]:
        for i in word:
            if i.isalpha():
                word.replace(i, '#')
            else:
                continue
    return " ".join(redacted_sentence)

if the input is sentence = "You cannot drink the word 'water'."如果输入是sentence = "You cannot drink the word 'water'." the output should be "You cannot ##### the word '#####'." output 应该是"You cannot ##### the word '#####'."

The output I get is just a list of the words in my input.我得到的 output 只是我输入的单词列表。

We can accomplish this with a little Python magic:我们可以用一点 Python 魔法来完成这个:

def redact_words(sentence):
    redacted_sentence = []
    sentence = sentence.split()
    for pos, word in enumerate(sentence, start=1):
        if pos % 3 == 0:
            word = "".join("#" if letter.isalpha() else letter for letter in word)
        redacted_sentence.append(word)
    return " ".join(redacted_sentence)

First, we create a list to contain the words of the new sentence.首先,我们创建一个列表来包含新句子的单词。 Next, after splitting the sentence into a list, we use enumerate to generate the positions of each word in the sentence, along with the word itself.接下来,在将句子拆分为列表后,我们使用enumerate生成句子中每个单词的位置,以及单词本身。 By starting at 1, we can use the modulus operator to see if the position is evenly divisible by 3. If so, we use a comprehension to replace all the alphabetical characters in word with # , leaving the other characters alone, then reassign the results back to word .从 1 开始,我们可以使用模运算符来查看 position 是否可以被 3 整除。如果是,我们使用推导式将word中的所有字母字符替换为# ,不理会其他字符,然后重新分配结果word Finally, we append the word to the redacted_sentence list, regardless of whether it's been changed, and return a string with all the words joined together with a space.最后,我们把 append 这个词放到redacted_sentence列表中,不管它是否被改变,并返回一个字符串,所有的词用空格连接在一起。

You've got two issues that sum to the same problem: Creating and discarding mutated copies of the string, while leaving the original untouched.您有两个问题归结为同一个问题:创建和丢弃字符串的变异副本,同时保持原始字符串不变。 str.replace must be assigned somewhere to be useful (usually reassigning word in this case), but also, to update the original list , you must reassign that index in the list ( word is a separate alias to the object in the list , but reassigning word just rebinds word and breaks the aliasing, it doesn't change the contents of the list ). str.replace必须分配到有用的地方(通常在这种情况下重新分配word ),而且,要更新原始list ,您必须重新分配列表中的索引( wordlist list object 的单独别名,但是重新分配 word 只是重新绑定word并打破别名,它不会更改list的内容)。 So the solution is:所以解决方案是:

  1. Keep the results from each replace operation保留每次replace操作的结果
  2. Put the final result back into the list at the same location将最终结果放回相同位置的list

The minimalist modification to your code that achieves this result while still following the same basic design is:在保持相同基本设计的同时实现此结果的代码的最低限度修改是:

from itertools import count  # So we can track the index to perform replacement at

def redact_words(sentence):
    redacted_sentence = sentence.split()
    for i, word in zip(count(2, 3), redacted_sentence[2::3]):  # Track index and value
        for c in set(word):  # Change name to c; i is for indices, not characters/letters
                             # For minor efficiency gain, dedupe word so we don't replace same char over and over
            if c.isalpha():
                word = word.replace(c, '#')  # Reassign back to word so change not lost
        redacted_sentence[i] = word  # Replace original word in list with altered word
    return " ".join(redacted_sentence)

A faster solution would replace the inner loop with a single-pass regex substitution or (if only ASCII need be handled) str.translate call, replacing O(n²) work per word with O(n) work, eg:更快的解决方案是用单遍正则表达式替换或(如果只需要处理 ASCII) str.translate调用替换内部循环,用 O(n) 替换每个单词的O(n²) O(n) ,例如:

import re
from itertools import count  # So we can track the index to perform replacement at

# Precompile regex that matches only alphabetic characters and bind its sub method
# while we're at it    
replace_alpha = re.compile(r'[^\W\d_]').sub

def redact_words(sentence):
    redacted_sentence = sentence.split()
    for i, word in zip(count(2, 3), redacted_sentence[2::3]):  # Track index and value
        # Replace every alphabetic character with # in provided word and replace
        # list's contents at same index
        redacted_sentence[i] = replace_alpha('#', word)
    return " ".join(redacted_sentence)

Here is a way you can do it.这是您可以做到的一种方法。

def redact_words(sentence, red_acted_words=None):
    if red_acted_words is None:
        red_acted_words = sentence.split()[2::3]
    for rword in red_acted_words:
        j = "#" * len(rword)
        sentence = sentence.split(rword)
        sentence = j.join(sentence)
    return sentence
redact_words("You cannot drink the word 'water'.", red_acted_words=["water", "drink"])
redact_words("You cannot drink the word 'water'.")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM