简体   繁体   English

使用 python 从文本文件中分离所有段落并保存每个分离段落的单个文本文件

[英]Separating all passages from a text file and saving individual text file of each separated passage using python

Summary of the problem: I have a text file which contains 100 passages.问题摘要:我有一个包含 100 个段落的文本文件。 I need to separate out all those 100 passages and save them individually in 100 text files.我需要分离出所有这 100 个段落,并将它们分别保存在 100 个文本文件中。

Pattern of passages in input text file:输入文本文件中的段落模式:

25763772|t|DCTN4 as a modifier of chronic Pseudomonas aeruginosa infection in cystic fibrosis
25763772|a|Pseudomonas aeruginosa (Pa) infection in cystic fibrosis (CF) patients is present
25763772    0   5   DCTN4   T116,T123   C4308010
25763772    23  63  chronic Pseudomonas aeruginosa infection    T047    C0854135
25763772    67  82  cystic fibrosis T047    C0010674

25847295|t|Nonylphenol diethoxylate inhibits apoptosis induced in PC12 cells
25847295|a|Nonylphenol and short-chain nonylphenol ethoxylates such as NP2 EO are digested
25847295    0   24  Nonylphenol diethoxylate    T131    C1254354
25847295    25  33  inhibits    T052    C3463820

Likewise there are 100 passages of variable lengths present in that single text file.同样,该单个文本文件中存在 100 段可变长度的段落。

I'm trying a code like this which is not showing any error but not able to extract and save even a single passage individually.我正在尝试这样的代码,它没有显示任何错误,但无法单独提取和保存单个段落。 Please suggest any kind of help or solution on this.请就此提出任何帮助或解决方案。 Thanks in advance.提前致谢。

Code:代码:

with open('corpus_pubtator1.txt', 'r') as contents, open('tested23.txt', 'w') as file:
    contents = contents.read()
    lines = contents.split('\n')
    for index, line in enumerate(lines):
        if index != len(lines) - 1:
            file.write(line + '.\n')
        else:
            pass

Try this:尝试这个:

lines = []
with open("corpus_pubtator1.txt", "r") as rf:
    lines = rf.readlines()
lines = [i if i else i.strip() for i in lines]
passages = []
passage_cache = []
for i, line in enumerate(lines):
    if i == len(lines) - 1:
        passages.append(passage_cache)
    if line.strip():
        passage_cache.append(line)
    else:
        passages.append(passage_cache)
        passage_cache = [line]
for i, passage in enumerate(passages):
    with open(f"tested{i}.txt", 'w') as wf:
        for line in passage:
            wf.write(line)

It would open the first input file, read all the lines and differentiate the passages wrt a vacant line in between and for each passages it would create a separate text file and write lines inside it.它将打开第一个输入文件,读取所有行并区分段落之间的空行,并且对于每个段落,它将创建一个单独的文本文件并在其中写入行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM