簡體   English   中英

查找句子中的每三個單詞,並僅將其字母替換為 # 符號

[英]Finding every third word in a sentence and replacing only its letters with the # symbol

這是我的代碼:

def redact_words(sentence):
    redacted_sentence = sentence.split()
    for word in redacted_sentence[2::3]:
        for i in word:
            if i.isalpha():
                word.replace(i, '#')
            else:
                continue
    return " ".join(redacted_sentence)

如果輸入是sentence = "You cannot drink the word 'water'." output 應該是"You cannot ##### the word '#####'."

我得到的 output 只是我輸入的單詞列表。

我們可以用一點 Python 魔法來完成這個:

def redact_words(sentence):
    redacted_sentence = []
    sentence = sentence.split()
    for pos, word in enumerate(sentence, start=1):
        if pos % 3 == 0:
            word = "".join("#" if letter.isalpha() else letter for letter in word)
        redacted_sentence.append(word)
    return " ".join(redacted_sentence)

首先,我們創建一個列表來包含新句子的單詞。 接下來,在將句子拆分為列表后,我們使用enumerate生成句子中每個單詞的位置,以及單詞本身。 從 1 開始,我們可以使用模運算符來查看 position 是否可以被 3 整除。如果是,我們使用推導式將word中的所有字母字符替換為# ,不理會其他字符,然后重新分配結果word 最后,我們把 append 這個詞放到redacted_sentence列表中,不管它是否被改變,並返回一個字符串,所有的詞用空格連接在一起。

您有兩個問題歸結為同一個問題:創建和丟棄字符串的變異副本,同時保持原始字符串不變。 str.replace必須分配到有用的地方(通常在這種情況下重新分配word ),而且,要更新原始list ,您必須重新分配列表中的索引( wordlist list object 的單獨別名,但是重新分配 word 只是重新綁定word並打破別名,它不會更改list的內容)。 所以解決方案是:

  1. 保留每次replace操作的結果
  2. 將最終結果放回相同位置的list

在保持相同基本設計的同時實現此結果的代碼的最低限度修改是:

from itertools import count  # So we can track the index to perform replacement at

def redact_words(sentence):
    redacted_sentence = sentence.split()
    for i, word in zip(count(2, 3), redacted_sentence[2::3]):  # Track index and value
        for c in set(word):  # Change name to c; i is for indices, not characters/letters
                             # For minor efficiency gain, dedupe word so we don't replace same char over and over
            if c.isalpha():
                word = word.replace(c, '#')  # Reassign back to word so change not lost
        redacted_sentence[i] = word  # Replace original word in list with altered word
    return " ".join(redacted_sentence)

更快的解決方案是用單遍正則表達式替換或(如果只需要處理 ASCII) str.translate調用替換內部循環,用 O(n) 替換每個單詞的O(n²) O(n) ,例如:

import re
from itertools import count  # So we can track the index to perform replacement at

# Precompile regex that matches only alphabetic characters and bind its sub method
# while we're at it    
replace_alpha = re.compile(r'[^\W\d_]').sub

def redact_words(sentence):
    redacted_sentence = sentence.split()
    for i, word in zip(count(2, 3), redacted_sentence[2::3]):  # Track index and value
        # Replace every alphabetic character with # in provided word and replace
        # list's contents at same index
        redacted_sentence[i] = replace_alpha('#', word)
    return " ".join(redacted_sentence)

這是您可以做到的一種方法。

def redact_words(sentence, red_acted_words=None):
    if red_acted_words is None:
        red_acted_words = sentence.split()[2::3]
    for rword in red_acted_words:
        j = "#" * len(rword)
        sentence = sentence.split(rword)
        sentence = j.join(sentence)
    return sentence
redact_words("You cannot drink the word 'water'.", red_acted_words=["water", "drink"])
redact_words("You cannot drink the word 'water'.")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM