简体   繁体   English

从文本文件中提取特定记录并保存到 Python 中的新文件

[英]Extract specific records from a text file and save to a new file in Python

I have a txt file which contain thousands of receipts info.我有一个包含数千个收据信息的 txt 文件。 There are two types:有两种类型:

  • Regular receipts定期收据

定期收据

  • Summary receipts汇总收据

I need to get all Summary receipts and it's contents only and itsand write them into a new file.我需要获取所有摘要收据及其内容,并将它们写入一个新文件。

The following is what I've done so far, but what it does is just copying everything to a new file.以下是我到目前为止所做的,但它所做的只是将所有内容复制到一个新文件中。

filtered = []
with open("sample.txt", "r+") as file: 
    for line in file:
        filtered.append(line.split(""" 
                    Company Name
                      A CITY         
                    Name of CITY              
                     Tin:00000     
                      #10000      
            N#00108235 Cashier ID#0000 
        - - - - - - - - - - - - - - - - - - - -
                Report(X-Report)         
        """))

    outputfile = open("output.txt","w") 
    for lines in filtered:
        outputfile.write(str(lines))

I'm quite new to python and tips or guidance is much appreciated.我对 python 很陌生,非常感谢提示或指导。 TIA TIA

do you only need to seperate them according to the type?您只需要根据类型将它们分开吗? Simple solution as per your explaination is to read the contents of the file and find the word "SUMMARY OF CHARGES" in that file, if found then save the content to a new file.根据您的解释,简单的解决方案是读取文件的内容并在该文件中找到单词“SUMMARY OF CHARGES”,如果找到,则将内容保存到新文件中。 regex for anything with a word abc in it will be .*abc.* the code would be something like this if you have single file for single receipt.任何带有单词abc的正则表达式将是.*abc.*如果您有单个收据的单个文件,则代码将是这样的。

import re
with open("sample.txt","r") as sfile:
   cont=sfile.read()
if (re.match(".*SUMMARY OF CHARGES.*",cont)):
   with open("outfile.txt","w") as outfile:
      outfile.write(cont)

To seperate the contents of individual receipts, you can use the regex groups.要分隔各个收据的内容,您可以使用正则表达式组。 make the regex in such a way that it'll only keep a single receipt, then make a group (your_regex)* then iterate over that group to get all the matching receips.使正则表达式只保留一张收据,然后创建一个组 (your_regex)* 然后遍历该组以获取所有匹配的收据。

First we can split the entire file into a list of recipe like this.首先,我们可以将整个文件拆分成这样的配方列表。

with open("sample.txt", "r+") as file: 
    receipts = file.read()

# We convert it to a list of receipts
receipts = receipts.split("- - - - -") #<=== This should be tweak to ensure that we split all receipt. You can also use "FROM THE DATE  PERMIT TO USE"

Then we filter hour list we something that is unique in recipe list.然后我们过滤小时列表,我们在食谱列表中是独一无二的。

my_filter = lambda receipt: "SUMMARY OF CHARGE" in receipt
summaries = list(filter(my_filter, receipts)) 

with open("out.txt", "a") as outfile:
    for summary in summaries:
        outfile.write(summary) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从文件中提取特定行并将特定行保存到python中的每个新文件中 - How to extract specific lines from a file and save specific lines into each new file in python 从 Python 中的文本文件中提取特定字符串 - Extract specific strings from text file in Python Python从文本文件中提取特定数字 - Python to extract specific numbers from text file 从文件中读取特定数据并将其保存在python中的新文件中 - To read the specific data from file and should save it new file in python 如何使用 python 从文本文件中提取特定文本段落? - How to extract specific text paragraphs from a Text file using python? 使用 python 将特定列从文本文件复制到新文本文件 - Copying specific columns from text file to a new text file with python 有没有办法可以从python中的多个文本文件中提取多个数据,并将其保存为新的.csv文件中的一行? - Is there a way I can extract mutliple pieces of data from a multiple text file in python and save it as a row in a new .csv file? 从文本文件中提取多种模式并将其保存到熊猫数据框[python] - Extract multiple patterns from a text file and save it to a panda dataframe [python] 正则表达式:从文本文件中提取记录 - Regex: Extract records from text file 如何从 python 中的文本文件中提取特定数据 - How to extract specific data from a text file in python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM