Pandas 組合字符串的意外拆分並匹配替換

Question

我的 powerpoint 文件中有如下文本段落

para = "XX NOV 2021, Time: xx:xx – xx:xx hrs (90mins)"

現在這個 para（字符串類型）是根據內置的 PPT 邏輯拆分的，這會導致 para 意外拆分為如下所示的關鍵字。 （我不控制這種拆分邏輯）。 雖然這個問題的 scope，如果你想了解更多關於我的問題，你可以在這里參考這篇文章

split_list = ["XX Nov", "2021," ,"Time:xx", ":xx - xx:xx", " hrs (90mins)"]

現在，我的目標是將關鍵字Nov 2021從 para 替換為Nov 2022 （例如 CTRL+F 和替換）

所以，我嘗試了以下

for s in split_list:
   print(type(s))   # str type is returned
   cur_text = s
   new_text = cur_text.replace("Nov 2021", "Nov 2022")
   split_list.update(s)
new_para = ' '.join(split_list)

正如預期的那樣，這不會進行替換，因為我的搜索詞Nov 2021找不到匹配項，因為字符串存儲為XX Nov和2021等。

我們如何將之前的 N 個關鍵字組合到 split_list 中的當前關鍵字並進行替換。 N 的范圍可以從 1 到 3。

是否有任何 python 循環解決方案（我們可以同時查看以前和當前關鍵字）等？

請注意，我無法在輸入參數處進行替換，因為它將丟失所有文本格式，例如粗體、斜體、格式等。因此，我們在關鍵字列表中進行替換（來自para ）

基本上，我希望我的最終 output 如下所示。

para = "XX NOV 2022, Time: xx:xx – xx:xx hrs (90mins)"

Answer 1

簡單的二元序列生成：

sentence = " ".join(split_list)
size_n_gram = 2
words = sentence.split(" ")
n_gram_words = []
for i, _ in enumerate(words):
    if i + size_n_gram >= len(words):
        # Have all the words
        break
    else:
        n_gram_words.append(words[i:i+n_gram_words])

# Here we hold all two word sequences and can then search in n_gram_words
# for which words to be replaced. Then combine the correct string.

Pandas 組合字符串的意外拆分並匹配替換

問題描述

1 個解決方案

解決方案1
-1 2022-08-05 06:14:07

Pandas 組合字符串的意外拆分並匹配替換

問題描述

1 個解決方案

解決方案1 -1 2022-08-05 06:14:07

解決方案1
-1 2022-08-05 06:14:07