[英]I want to redact a sentence and return # for every third word in the sentence
[英]Finding every third word in a sentence and replacing only its letters with the # symbol
這是我的代碼:
def redact_words(sentence):
redacted_sentence = sentence.split()
for word in redacted_sentence[2::3]:
for i in word:
if i.isalpha():
word.replace(i, '#')
else:
continue
return " ".join(redacted_sentence)
如果輸入是sentence = "You cannot drink the word 'water'."
output 應該是"You cannot ##### the word '#####'."
我得到的 output 只是我輸入的單詞列表。
我們可以用一點 Python 魔法來完成這個:
def redact_words(sentence):
redacted_sentence = []
sentence = sentence.split()
for pos, word in enumerate(sentence, start=1):
if pos % 3 == 0:
word = "".join("#" if letter.isalpha() else letter for letter in word)
redacted_sentence.append(word)
return " ".join(redacted_sentence)
首先,我們創建一個列表來包含新句子的單詞。 接下來,在將句子拆分為列表后,我們使用enumerate
生成句子中每個單詞的位置,以及單詞本身。 從 1 開始,我們可以使用模運算符來查看 position 是否可以被 3 整除。如果是,我們使用推導式將word
中的所有字母字符替換為#
,不理會其他字符,然后重新分配結果word
。 最后,我們把 append 這個詞放到redacted_sentence
列表中,不管它是否被改變,並返回一個字符串,所有的詞用空格連接在一起。
您有兩個問題歸結為同一個問題:創建和丟棄字符串的變異副本,同時保持原始字符串不變。 str.replace
必須分配到有用的地方(通常在這種情況下重新分配word
),而且,要更新原始list
,您必須重新分配列表中的索引( word
是list
list
object 的單獨別名,但是重新分配 word 只是重新綁定word
並打破別名,它不會更改list
的內容)。 所以解決方案是:
replace
操作的結果list
中在保持相同基本設計的同時實現此結果的代碼的最低限度修改是:
from itertools import count # So we can track the index to perform replacement at
def redact_words(sentence):
redacted_sentence = sentence.split()
for i, word in zip(count(2, 3), redacted_sentence[2::3]): # Track index and value
for c in set(word): # Change name to c; i is for indices, not characters/letters
# For minor efficiency gain, dedupe word so we don't replace same char over and over
if c.isalpha():
word = word.replace(c, '#') # Reassign back to word so change not lost
redacted_sentence[i] = word # Replace original word in list with altered word
return " ".join(redacted_sentence)
更快的解決方案是用單遍正則表達式替換或(如果只需要處理 ASCII) str.translate
調用替換內部循環,用 O(n) 替換每個單詞的O(n²)
O(n)
,例如:
import re
from itertools import count # So we can track the index to perform replacement at
# Precompile regex that matches only alphabetic characters and bind its sub method
# while we're at it
replace_alpha = re.compile(r'[^\W\d_]').sub
def redact_words(sentence):
redacted_sentence = sentence.split()
for i, word in zip(count(2, 3), redacted_sentence[2::3]): # Track index and value
# Replace every alphabetic character with # in provided word and replace
# list's contents at same index
redacted_sentence[i] = replace_alpha('#', word)
return " ".join(redacted_sentence)
這是您可以做到的一種方法。
def redact_words(sentence, red_acted_words=None):
if red_acted_words is None:
red_acted_words = sentence.split()[2::3]
for rword in red_acted_words:
j = "#" * len(rword)
sentence = sentence.split(rword)
sentence = j.join(sentence)
return sentence
redact_words("You cannot drink the word 'water'.", red_acted_words=["water", "drink"])
redact_words("You cannot drink the word 'water'.")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.