繁体   English   中英

如何读取.txt文件的特定行?

[英]How to read specific lines of .txt file?

我正在尝试从文本文件中提取信息并存储每个“段落”,按段落我的意思是我需要日期(始终是第一个索引)以及与该日期相关的任何描述(该日期之后的信息,但之前下一个日期),.txt 看起来像

September 2013. **I NEED THE DATA THAT WOULD BE WRITTEN HERE STORED WITH ITS DATE HOWEVER 
WHEN ANOTHER DATE SHOWS UP IT NEEDS TO BE SEPERATED
September 2013. blah blah balh this is an example blah blaha blah I need the information hereblah blah balh this is an example blah blaha blah I need the information here
blah blah balh this is an example blah blaha blah I need the information here
August 2013. blah blah balh this is an example blah blaha blah I need the information here
August 2013.blah blah balh this is an example blah blaha blah I need the information here
blah blah balh this is an example blah blaha blah I need the information hereblah blah balh this is an example blah blaha blah I need the information hereblah blah balh this is an example blah blaha blah I need the information here
June 2013. blah blah balh this is an example blah blaha blah I need the information hereeeeee

日期之后没有确定的行数。

我能够找到以日期开头的每一行

with open("test.txt", encoding="utf8") as input:
    for line in input:
        for month in months:
            if month in line:
                print(line)

但这输出

"May 2014. only the first line is taken in and not the rest of the paragraph

April 2013. only the first line is taken in and not the rest of the paragraph

December 2013. only the first line is taken in and not the rest of the paragraph

November 2012. only the first line is taken in and not the rest of the paragraph

如果您读取的文件适合 memory,则大多数时候最好的选择是读取完整的文件然后对其进行操作。

如果您可能有大文件(100MB 或更多),您可能需要分块读取:

https://stackoverflow.com/a/519653/562769

但是,这意味着您需要编写更复杂的逻辑来处理这些块。

如果您的行可以变得任意大,则按行阅读没有意义。 对于操作系统/文件系统,一行是没有意义的单位。 换行符只是:更大文件中的一个字符。 就像任何其他角色一样。

关于行匹配,您可以执行以下操作:

with open("file.txt") as fp:
    data = fp.read()

for line in data.split("/n"):
    if matches(line):
        operate(line) 

匹配的地方是 function ,它检查您的日期条件是否满足并操作是否符合您想要对该行执行的操作。

匹配 function 可以使用多个 if-elif 语句或正则表达式(re 模块)。 在 "haystack" 中使用 split / startswith / "pattern" 可能很有用

假设每一行都以用空格分隔的月份和年份开头,这将起作用。 但是,您的示例文本中有一行以月/年开头,这让我想知道您是否期望它拒绝不以月/年开头的行。

with open('filename.txt', 'r') as f:
    data = f.readlines()

for line in data:
    words = line.strip().split(' ')
    date = ' '.join(words[0:2])
    desc = ' '.join(words[2:])
    print(f'{date} | {desc}\n')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM