[英]Extract specific records from a text file and save to a new file in Python
I have a txt file which contain thousands of receipts info.我有一个包含数千个收据信息的 txt 文件。 There are two types:有两种类型:
I need to get all Summary receipts and it's contents only and itsand write them into a new file.我需要获取所有摘要收据及其内容,并将它们写入一个新文件。
The following is what I've done so far, but what it does is just copying everything to a new file.以下是我到目前为止所做的,但它所做的只是将所有内容复制到一个新文件中。
filtered = []
with open("sample.txt", "r+") as file:
for line in file:
filtered.append(line.split("""
Company Name
A CITY
Name of CITY
Tin:00000
#10000
N#00108235 Cashier ID#0000
- - - - - - - - - - - - - - - - - - - -
Report(X-Report)
"""))
outputfile = open("output.txt","w")
for lines in filtered:
outputfile.write(str(lines))
I'm quite new to python and tips or guidance is much appreciated.我对 python 很陌生,非常感谢提示或指导。 TIA TIA
do you only need to seperate them according to the type?您只需要根据类型将它们分开吗? Simple solution as per your explaination is to read the contents of the file and find the word "SUMMARY OF CHARGES" in that file, if found then save the content to a new file.根据您的解释,简单的解决方案是读取文件的内容并在该文件中找到单词“SUMMARY OF CHARGES”,如果找到,则将内容保存到新文件中。 regex for anything with a word abc
in it will be .*abc.*
the code would be something like this if you have single file for single receipt.任何带有单词abc
的正则表达式将是.*abc.*
如果您有单个收据的单个文件,则代码将是这样的。
import re
with open("sample.txt","r") as sfile:
cont=sfile.read()
if (re.match(".*SUMMARY OF CHARGES.*",cont)):
with open("outfile.txt","w") as outfile:
outfile.write(cont)
To seperate the contents of individual receipts, you can use the regex groups.要分隔各个收据的内容,您可以使用正则表达式组。 make the regex in such a way that it'll only keep a single receipt, then make a group (your_regex)* then iterate over that group to get all the matching receips.使正则表达式只保留一张收据,然后创建一个组 (your_regex)* 然后遍历该组以获取所有匹配的收据。
First we can split the entire file into a list of recipe like this.首先,我们可以将整个文件拆分成这样的配方列表。
with open("sample.txt", "r+") as file:
receipts = file.read()
# We convert it to a list of receipts
receipts = receipts.split("- - - - -") #<=== This should be tweak to ensure that we split all receipt. You can also use "FROM THE DATE PERMIT TO USE"
Then we filter hour list we something that is unique in recipe list.然后我们过滤小时列表,我们在食谱列表中是独一无二的。
my_filter = lambda receipt: "SUMMARY OF CHARGE" in receipt
summaries = list(filter(my_filter, receipts))
with open("out.txt", "a") as outfile:
for summary in summaries:
outfile.write(summary)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.