使用 python 從文本文件中分離所有段落並保存每個分離段落的單個文本文件

Question

問題摘要：我有一個包含 100 個段落的文本文件。 我需要分離出所有這 100 個段落，並將它們分別保存在 100 個文本文件中。

輸入文本文件中的段落模式：

25763772|t|DCTN4 as a modifier of chronic Pseudomonas aeruginosa infection in cystic fibrosis
25763772|a|Pseudomonas aeruginosa (Pa) infection in cystic fibrosis (CF) patients is present
25763772    0   5   DCTN4   T116,T123   C4308010
25763772    23  63  chronic Pseudomonas aeruginosa infection    T047    C0854135
25763772    67  82  cystic fibrosis T047    C0010674

25847295|t|Nonylphenol diethoxylate inhibits apoptosis induced in PC12 cells
25847295|a|Nonylphenol and short-chain nonylphenol ethoxylates such as NP2 EO are digested
25847295    0   24  Nonylphenol diethoxylate    T131    C1254354
25847295    25  33  inhibits    T052    C3463820

同樣，該單個文本文件中存在 100 段可變長度的段落。

我正在嘗試這樣的代碼，它沒有顯示任何錯誤，但無法單獨提取和保存單個段落。 請就此提出任何幫助或解決方案。 提前致謝。

代碼：

with open('corpus_pubtator1.txt', 'r') as contents, open('tested23.txt', 'w') as file:
    contents = contents.read()
    lines = contents.split('\n')
    for index, line in enumerate(lines):
        if index != len(lines) - 1:
            file.write(line + '.\n')
        else:
            pass

Answer 1

嘗試這個：

lines = []
with open("corpus_pubtator1.txt", "r") as rf:
    lines = rf.readlines()
lines = [i if i else i.strip() for i in lines]
passages = []
passage_cache = []
for i, line in enumerate(lines):
    if i == len(lines) - 1:
        passages.append(passage_cache)
    if line.strip():
        passage_cache.append(line)
    else:
        passages.append(passage_cache)
        passage_cache = [line]
for i, passage in enumerate(passages):
    with open(f"tested{i}.txt", 'w') as wf:
        for line in passage:
            wf.write(line)

它將打開第一個輸入文件，讀取所有行並區分段落之間的空行，並且對於每個段落，它將創建一個單獨的文本文件並在其中寫入行。

使用 python 從文本文件中分離所有段落並保存每個分離段落的單個文本文件

問題描述

1 個解決方案

解決方案1
0 2021-12-06 11:13:04

使用 python 從文本文件中分離所有段落並保存每個分離段落的單個文本文件

問題描述

1 個解決方案

解決方案1 0 2021-12-06 11:13:04

解決方案1
0 2021-12-06 11:13:04