從文本文件中提取特定記錄並保存到 Python 中的新文件

Question

我有一個包含數千個收據信息的 txt 文件。 有兩種類型：

定期收據

匯總收據

我需要獲取所有摘要收據及其內容，並將它們寫入一個新文件。

以下是我到目前為止所做的，但它所做的只是將所有內容復制到一個新文件中。

filtered = []
with open("sample.txt", "r+") as file: 
    for line in file:
        filtered.append(line.split(""" 
                    Company Name
                      A CITY         
                    Name of CITY              
                     Tin:00000     
                      #10000      
            N#00108235 Cashier ID#0000 
        - - - - - - - - - - - - - - - - - - - -
                Report(X-Report)         
        """))

    outputfile = open("output.txt","w") 
    for lines in filtered:
        outputfile.write(str(lines))

我對 python 很陌生，非常感謝提示或指導。 TIA

Answer 1

您只需要根據類型將它們分開嗎？ 根據您的解釋，簡單的解決方案是讀取文件的內容並在該文件中找到單詞“SUMMARY OF CHARGES”，如果找到，則將內容保存到新文件中。 任何帶有單詞abc的正則表達式將是.*abc.*如果您有單個收據的單個文件，則代碼將是這樣的。

import re
with open("sample.txt","r") as sfile:
   cont=sfile.read()
if (re.match(".*SUMMARY OF CHARGES.*",cont)):
   with open("outfile.txt","w") as outfile:
      outfile.write(cont)

要分隔各個收據的內容，您可以使用正則表達式組。 使正則表達式只保留一張收據，然后創建一個組 (your_regex)* 然后遍歷該組以獲取所有匹配的收據。

Answer 2

首先，我們可以將整個文件拆分成這樣的配方列表。

with open("sample.txt", "r+") as file: 
    receipts = file.read()

# We convert it to a list of receipts
receipts = receipts.split("- - - - -") #<=== This should be tweak to ensure that we split all receipt. You can also use "FROM THE DATE  PERMIT TO USE"

然后我們過濾小時列表，我們在食譜列表中是獨一無二的。

my_filter = lambda receipt: "SUMMARY OF CHARGE" in receipt
summaries = list(filter(my_filter, receipts)) 

with open("out.txt", "a") as outfile:
    for summary in summaries:
        outfile.write(summary)

從文本文件中提取特定記錄並保存到 Python 中的新文件

問題描述

2 個解決方案

解決方案1
0 2019-10-23 09:55:05

解決方案2
0 2019-10-23 09:56:22

從文本文件中提取特定記錄並保存到 Python 中的新文件

問題描述

2 個解決方案

解決方案1 0 2019-10-23 09:55:05

解決方案2 0 2019-10-23 09:56:22

解決方案1
0 2019-10-23 09:55:05

解決方案2
0 2019-10-23 09:56:22