[英]I want to redact a sentence and return # for every third word in the sentence
[英]Finding every third word in a sentence and replacing only its letters with the # symbol
这是我的代码:
def redact_words(sentence):
redacted_sentence = sentence.split()
for word in redacted_sentence[2::3]:
for i in word:
if i.isalpha():
word.replace(i, '#')
else:
continue
return " ".join(redacted_sentence)
如果输入是sentence = "You cannot drink the word 'water'."
output 应该是"You cannot ##### the word '#####'."
我得到的 output 只是我输入的单词列表。
我们可以用一点 Python 魔法来完成这个:
def redact_words(sentence):
redacted_sentence = []
sentence = sentence.split()
for pos, word in enumerate(sentence, start=1):
if pos % 3 == 0:
word = "".join("#" if letter.isalpha() else letter for letter in word)
redacted_sentence.append(word)
return " ".join(redacted_sentence)
首先,我们创建一个列表来包含新句子的单词。 接下来,在将句子拆分为列表后,我们使用enumerate
生成句子中每个单词的位置,以及单词本身。 从 1 开始,我们可以使用模运算符来查看 position 是否可以被 3 整除。如果是,我们使用推导式将word
中的所有字母字符替换为#
,不理会其他字符,然后重新分配结果word
。 最后,我们把 append 这个词放到redacted_sentence
列表中,不管它是否被改变,并返回一个字符串,所有的词用空格连接在一起。
您有两个问题归结为同一个问题:创建和丢弃字符串的变异副本,同时保持原始字符串不变。 str.replace
必须分配到有用的地方(通常在这种情况下重新分配word
),而且,要更新原始list
,您必须重新分配列表中的索引( word
是list
list
object 的单独别名,但是重新分配 word 只是重新绑定word
并打破别名,它不会更改list
的内容)。 所以解决方案是:
replace
操作的结果list
中在保持相同基本设计的同时实现此结果的代码的最低限度修改是:
from itertools import count # So we can track the index to perform replacement at
def redact_words(sentence):
redacted_sentence = sentence.split()
for i, word in zip(count(2, 3), redacted_sentence[2::3]): # Track index and value
for c in set(word): # Change name to c; i is for indices, not characters/letters
# For minor efficiency gain, dedupe word so we don't replace same char over and over
if c.isalpha():
word = word.replace(c, '#') # Reassign back to word so change not lost
redacted_sentence[i] = word # Replace original word in list with altered word
return " ".join(redacted_sentence)
更快的解决方案是用单遍正则表达式替换或(如果只需要处理 ASCII) str.translate
调用替换内部循环,用 O(n) 替换每个单词的O(n²)
O(n)
,例如:
import re
from itertools import count # So we can track the index to perform replacement at
# Precompile regex that matches only alphabetic characters and bind its sub method
# while we're at it
replace_alpha = re.compile(r'[^\W\d_]').sub
def redact_words(sentence):
redacted_sentence = sentence.split()
for i, word in zip(count(2, 3), redacted_sentence[2::3]): # Track index and value
# Replace every alphabetic character with # in provided word and replace
# list's contents at same index
redacted_sentence[i] = replace_alpha('#', word)
return " ".join(redacted_sentence)
这是您可以做到的一种方法。
def redact_words(sentence, red_acted_words=None):
if red_acted_words is None:
red_acted_words = sentence.split()[2::3]
for rword in red_acted_words:
j = "#" * len(rword)
sentence = sentence.split(rword)
sentence = j.join(sentence)
return sentence
redact_words("You cannot drink the word 'water'.", red_acted_words=["water", "drink"])
redact_words("You cannot drink the word 'water'.")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.