Python：文本上的正則表達式

Question

Q4：刪除正文中的所有參考編號（包括括號）。 它應該刪除以下內容：[8] 等。在刪除它們之前，打印這些參考編號的列表，然后打印以下內容：有 {length of list} 參考編號要刪除。 我的代碼如下：

import re
with open('macOS.txt', 'r') as f:
  content = f.read()
  
temp = re.sub('<[^>]*>', '', content)
print(f'There are {len(temp)} references numbers to be deleted.')
print(temp)

雖然我不確定這是正確的答案嗎？ 對於刪除 [8],[9] 我使用了 re.sub('<[^>]*>', '',content)

Q5：使用第4條的新文本，拆分文本，查看文本中有多少個句子。 請注意不要將期間拆分為以下內容：

蘋果公司

自 2001 年以來 OS X 10.1 等。

然后打印以下內容：文本中有 {length of list} 個句子。

但是在 Q5 我不知道如何使用 Q4 的新文本？ 任何人都可以請指導我如何做到這一點？

Answer 1

如果要匹配方括號之間的 1 個或多個數字，可以使用\[\d+] 。

您可以在 re.findall 的結果中獲取運行len的匹配數，並使用 re.sub 將匹配替換為空格。

import re

pattern = r"\[\d+]"

with open('macOS.txt', 'r') as f:
    content = f.read()
    print(f'There are {len(re.findall(pattern, content))} references numbers to be deleted.')
    result = re.sub(pattern, ' ', content)

    # use result for further processing

Python：文本上的正則表達式

問題描述

1 個解決方案

解決方案1
1 2021-05-03 10:18:01

Python：文本上的正則表達式

問題描述

1 個解決方案

解決方案1 1 2021-05-03 10:18:01

解決方案1
1 2021-05-03 10:18:01