简体   繁体   English

从txt文件中选取零件,然后使用python复制到另一个文件

[英]Pick parts from a txt file and copy to another file with python

I'm in trouble here. 我在这里遇到麻烦了。 I need to read a file. 我需要读取一个文件。 Txt file that contains a sequence of records, check the records that I want to copy them to a new file. 包含一系列记录的Txt文件,检查我要将它们复制到新文件的记录。 The file content is like this (this is just an example, the original file has more than 30 000 lines): 文件内容是这样的(这只是一个示例,原始文件有3万多行):

AAAAA|12|120 #begin file
00000|46|150 #begin register
03000|TO|460 
99999|35|436 #end register
00000|46|316 #begin register
03000|SP|467
99999|33|130 #end register
00000|46|778 #begin register
03000|TO|478
99999|33|457 #end register
ZZZZZ|15|111 #end file

The records that begin with 03000 and have the characters 'TO' must be written to a new file. 以03000开头且字符为“ TO”的记录必须写入新文件。 Based on the example, the file should look like this: 根据示例,文件应如下所示:

AAAAA|12|120 #begin file
00000|46|150 #begin register
03000|TO|460 
99999|35|436 #end register
00000|46|778 #begin register
03000|TO|478
99999|33|457 #end register
ZZZZZ|15|111 #end file

Code: 码:

file = open("file.txt",'r')
newFile = open("newFile.txt","w")    
content = file.read()
file.close()
# here I need to check if the record exists 03000 characters 'TO', if it exists, copy the recordset 00000-99999 for the new file.

I did multiple searches and found nothing to help me. 我进行了多次搜索,没有发现任何帮助。 Thank you! 谢谢!

with open("file.txt",'r') as inFile, open("newFile.txt","w") as outFile:
    outFile.writelines(line for line in inFile 
                       if line.startswith("03000") and "TO" in line)

If you need the previous and the next line, then you have to iterate inFile in triads. 如果您需要上一行和下一行,那么您必须在三元组中迭代inFile First define: 首先定义:

def gen_triad(lines, prev=None):
    after = current = next(lines)
    for after in lines:
        yield prev, current, after
        prev, current = current, after

And then do like before: 然后像以前一样做:

outFile.writelines(''.join(triad) for triad in gen_triad(inFile) 
                   if triad[1].startswith("03000") and "TO" in triad[1])
import re

pat = ('^00000\|\d+\|\d+.*\n'
       '^03000\|TO\|\d+.*\n'
       '^99999\|\d+\|\d+.*\n'
       '|'
       '^AAAAA\|\d+\|\d+.*\n'
       '|'
       '^ZZZZZ\|\d+\|\d+.*')
rag = re.compile(pat,re.MULTILINE)

with open('fifi.txt','r') as f,\
     open('newfifi.txt','w') as g:
    g.write(''.join(rag.findall(f.read())))

For files with additional lines between lines beginning with 00000, 03000 and 99999, I didn't find simpler code than this one: 对于以00000、03000和99999开头的行之间有其他行的文件,我发现没有比这更简单的代码了:

import re

pat = ('(^00000\|\d+\|\d+.*\n'
       '(?:.*\n)+?'
       '^99999\|\d+\|\d+.*\n)'
       '|'
       '(^AAAAA\|\d+\|\d+.*\n'
       '|'
       '^ZZZZZ\|\d+\|\d+.*)')
rag = re.compile(pat,re.MULTILINE)

pit = ('^00000\|.+?^03000\|TO\|\d+.+?^99999\|')
rig = re.compile(pit,re.DOTALL|re.MULTILINE)

def yi(text):
    for g1,g2 in rag.findall(text):
        if g2:
            yield g2
        elif rig.match(g1):
            yield g1

with open('fifi.txt','r') as f,\
     open('newfifi.txt','w') as g:
    g.write(''.join(yi(f.read())))
file = open("file.txt",'r')
newFile = open("newFile.txt","w")    
content = file.readlines()
file.close()
newFile.writelines(filter(lambda x:x.startswith("03000") and "TO" in x,content))

This seems to work. 这似乎有效。 The other answers seem to only be writing out records that contain '03000|TO|' 其他答案似乎只是写出包含“ 03000 | TO |”的记录 but you have to write out the record before and after that as well. 但您也必须在此前后写出记录。

    import sys
# ---------------------------------------------------------------
# ---------------------------------------------------------------
# import file
file_name = sys.argv[1]
file_path = 'C:\\DATA_SAVE\\pick_parts\\' + file_name
file = open(file_path,"r")
# ---------------------------------------------------------------
# create output files
output_file_path = 'C:\\DATA_SAVE\\pick_parts\\' + file_name + '.out'
output_file = open(output_file_path,"w")
# create output files

# ---------------------------------------------------------------
# process file

temp = ''
temp_out = ''
good_write = False
bad_write = False
for line in file:
    if line[:5] == 'AAAAA':
        temp_out += line 
    elif line[:5] == 'ZZZZZ':
        temp_out += line
    elif good_write:
        temp += line
        temp_out += temp
        temp = ''
        good_write = False
    elif bad_write:
        bad_write = False
        temp = ''
    elif line[:5] == '03000':
        if line[6:8] != 'TO':
            temp = ''
            bad_write = True
        else:
            good_write = True
            temp += line
            temp_out += temp 
            temp = ''
    else:
        temp += line

output_file.write(temp_out)
output_file.close()
file.close()

Output: 输出:

AAAAA|12|120 #begin file
00000|46|150 #begin register
03000|TO|460 
99999|35|436 #end register
00000|46|778 #begin register
03000|TO|478
99999|33|457 #end register
ZZZZZ|15|111 #end file

Does it have to be python? 一定是python吗? These shell commands would do the same thing in a pinch. 这些shell命令在紧要关头会做同样的事情。

head -1 inputfile.txt > outputfile.txt
grep -C 1 "03000|TO" inputfile.txt >> outputfile.txt
tail -1 inputfile.txt >> outputfile.txt
# Whenever I have to parse text files I prefer to use regular expressions
# You can also customize the matching criteria if you want to
import re
what_is_being_searched = re.compile("^03000.*TO")

# don't use "file" as a variable name since it is (was?) a builtin 
# function 
with open("file.txt", "r") as source_file, open("newFile.txt", "w") as destination_file:
    for this_line in source_file:
        if what_is_being_searched.match(this_line):
            destination_file.write(this_line)

and for those who prefer a more compact representation: 对于那些更喜欢紧凑的表示形式的人:

import re

with open("file.txt", "r") as source_file, open("newFile.txt", "w") as destination_file:
    destination_file.writelines(this_line for this_line in source_file 
                                if re.match("^03000.*TO", this_line))

code: 码:

fileName = '1'

fil = open(fileName,'r')

import string

##step 1: parse the file.

parsedFile = []

for i in fil:

    ##tuple1 = (1,2,3)    

    firstPipe = i.find('|')

    secondPipe = i.find('|',firstPipe+1)

    tuple1 = (i[:firstPipe],\
                i[firstPipe+1:secondPipe],\
                 i[secondPipe+1:i.find('\n')])

    parsedFile.append(tuple1)


fil.close()

##search criterias:

searchFirst = '03000'  
searchString = 'TO'  ##can be changed if and when required

##step 2: used the parsed contents to write the new file

filout = open('newFile','w')

stringToWrite = parsedFile[0][0] + '|' + parsedFile[0][1] + '|' + parsedFile[0][2] + '\n'

filout.write(stringToWrite)  ##to write the first entry

for i in range(1,len(parsedFile)):

    if parsedFile[i][1] == searchString and parsedFile[i][0] == searchFirst:

        for j in range(-1,2,1):

            stringToWrite = parsedFile[i+j][0] + '|' + parsedFile[i+j][1] + '|' + parsedFile[i+j][2] + '\n'

            filout.write(stringToWrite)


stringToWrite = parsedFile[-1][0] + '|' + parsedFile[-1][1] + '|' + parsedFile[-1][2] + '\n'

filout.write(stringToWrite)  ##to write the first entry

filout.close()

I know that this solution may be a bit long. 我知道这个解决方案可能会有点长。 But it is quite easy to understand. 但这很容易理解。 And it seems an intuitive way to do it. 这似乎是一种直观的方法。 And I have already checked this with the Data that you have provided and it works perfectly. 而且我已经使用您提供的数据进行了检查,它可以完美运行。

Please tell me if you need some more explanation on the code. 如果您需要有关代码的更多说明,请告诉我。 I will definitely add the same. 我一定会添加相同的内容。

I tip (Beasley and Joran elyase) very interesting, but it only allows to get the contents of the line 03000. I would like to get the contents of the lines 00000 to line 99999. I even managed to do here, but I am not satisfied, I wanted to make a more cleaner. 我给(Beasley和Joran elyase)小费很有趣,但它只允许获取03000行的内容。我想将00000行的内容获取到99999行。我什至设法在这里做,但我不是满意,我想做一个更清洁的。 See how I did: 看看我是怎么做的:

    file = open(url,'r')
    newFile = open("newFile.txt",'w')
    lines = file.readlines()        
    file.close()
    i = 0
    lineTemp = []
    for line in lines:                     
        lineTemp.append(line)                       
        if line[0:5] == '03000':
            state = line[21:23]                                
        if line[0:5] == '99999':
            if state == 'TO':
                newFile.writelines(lineTemp)                    
            else:
                linhaTemp = []                                                                            
        i = i+1                      
    newFile.close()

Suggestions... Thanks to all! 建议...谢谢大家!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM