簡體   English   中英

如何讀取.txt文件的特定行?

[英]How to read specific lines of .txt file?

我正在嘗試從文本文件中提取信息並存儲每個“段落”,按段落我的意思是我需要日期(始終是第一個索引)以及與該日期相關的任何描述(該日期之后的信息,但之前下一個日期),.txt 看起來像

September 2013. **I NEED THE DATA THAT WOULD BE WRITTEN HERE STORED WITH ITS DATE HOWEVER 
WHEN ANOTHER DATE SHOWS UP IT NEEDS TO BE SEPERATED
September 2013. blah blah balh this is an example blah blaha blah I need the information hereblah blah balh this is an example blah blaha blah I need the information here
blah blah balh this is an example blah blaha blah I need the information here
August 2013. blah blah balh this is an example blah blaha blah I need the information here
August 2013.blah blah balh this is an example blah blaha blah I need the information here
blah blah balh this is an example blah blaha blah I need the information hereblah blah balh this is an example blah blaha blah I need the information hereblah blah balh this is an example blah blaha blah I need the information here
June 2013. blah blah balh this is an example blah blaha blah I need the information hereeeeee

日期之后沒有確定的行數。

我能夠找到以日期開頭的每一行

with open("test.txt", encoding="utf8") as input:
    for line in input:
        for month in months:
            if month in line:
                print(line)

但這輸出

"May 2014. only the first line is taken in and not the rest of the paragraph

April 2013. only the first line is taken in and not the rest of the paragraph

December 2013. only the first line is taken in and not the rest of the paragraph

November 2012. only the first line is taken in and not the rest of the paragraph

如果您讀取的文件適合 memory,則大多數時候最好的選擇是讀取完整的文件然后對其進行操作。

如果您可能有大文件(100MB 或更多),您可能需要分塊讀取:

https://stackoverflow.com/a/519653/562769

但是,這意味着您需要編寫更復雜的邏輯來處理這些塊。

如果您的行可以變得任意大,則按行閱讀沒有意義。 對於操作系統/文件系統,一行是沒有意義的單位。 換行符只是:更大文件中的一個字符。 就像任何其他角色一樣。

關於行匹配,您可以執行以下操作:

with open("file.txt") as fp:
    data = fp.read()

for line in data.split("/n"):
    if matches(line):
        operate(line) 

匹配的地方是 function ,它檢查您的日期條件是否滿足並操作是否符合您想要對該行執行的操作。

匹配 function 可以使用多個 if-elif 語句或正則表達式(re 模塊)。 在 "haystack" 中使用 split / startswith / "pattern" 可能很有用

假設每一行都以用空格分隔的月份和年份開頭,這將起作用。 但是,您的示例文本中有一行以月/年開頭,這讓我想知道您是否期望它拒絕不以月/年開頭的行。

with open('filename.txt', 'r') as f:
    data = f.readlines()

for line in data:
    words = line.strip().split(' ')
    date = ' '.join(words[0:2])
    desc = ' '.join(words[2:])
    print(f'{date} | {desc}\n')

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM