使用 Python 中的第一列值将 txt 文件拆分为两个文件

Question

I would like to split a INPUT.txt file into two.txt files(Header & Data) by the value of the first column.我想通过第一列的值将 INPUT.txt 文件拆分为两个.txt 文件（标题和数据）。 Data before "H1000" will save in a header.txt file and after/equal to "H1000" will save in data.txt file. “H1000”之前的数据将保存在 header.txt 文件中，之后/等于“H1000”的数据将保存在 data.txt 文件中。

INPUT.txt输入.txt

H0002   Version 78                                                                                                                      
H0003   Date_generated  5-Aug-81                                                                                                                        
H0004   Reporting_period_end_date   09-Jun-81                                                                                                                       
H1000   State   WAAAA                                                                                                                       
H1002   Teno/Combno Z70/4000                                                                                                                        
H1003   Tener   Magn Reso NL    
H1004   LLD                                                                                     
D   AC056SCO1   NRM 11  12  6483516 25.98   0.4 1.35    0.25    0.51    0.01    0.06    0.1 56.23   2.29

With the output files being: output 文件为：

header.txt header.txt

H0002   Version 78                                                                                                                      
H0003   Date_generated  5-Aug-81                                                                                                                        
H0004   Reporting_period_end_date   09-Jun-81

data.txt数据.txt

H1000   State   WAAAA                                                                                                                       
H1002   Teno/Combno Z70/4000                                                                                                                        
H1003   Tener   Magn Reso NL    
H1004   LLD                                                                                     
D   AC056SCO1   NRM 11  12  6483516 25.98   0.4 1.35    0.25    0.51    0.01    0.06    0.1 56.23   2.29

Couple of problem that I am facing:我面临的几个问题：

"H1000" position is dynamic in different txt files. “H1000”position在不同的txt文件中是动态的。 If you see another input file see "H1000" position is different(Check Input File2 ).如果您看到另一个输入文件，请参阅“H1000”position 不同（检查输入文件 2）。 So my python code is first finding the position of H1000.所以我的 python 代码是首先找到 H1000 的 position。
I am using the position of H1000 for separating Header & Data file.我正在使用 H1000 的 position 来分离 Header 和数据文件。 Logic is not working correctly in separating the files.逻辑在分离文件时无法正常工作。

My python code:我的 python 代码：

if path_txt.is_file():
        txt_files = [Path(path_txt)] 
    else:
        txt_files = list(Path(path_txt).glob("*.txt"))
    
    for fn in txt_files:
       with open(fn) as fd_read:
            for line in fd_read:
               h_value = line.split(maxsplit=1)[0]
               value = int(h_value[1:]) #Finding the position of H1000
                   
            splitLen = 5  # Position of H1000
            HeaderBase = 'Header.txt'  # Header.txt
            DataBase = 'Data.txt'  # Data.txt

            with open(fn, 'r') as fp:
                input_list = fp.readlines()
                # to skip empties: input_list = [l for l in fp if l.strip()]

            for i in range(0, len(input_list), splitLen):
                with open(HeaderBase, 'w') as fp:
                    fp.write(''.join(input_list[0:(i-1)])) #Header.txt
                with open(DataBase, 'w') as fp:
                    fp.write(''.join(input_list[i:]))   #Data.txt

None of my logic is working.我的逻辑都不起作用。 Any help as I have stuck how to work this logic.任何帮助，因为我坚持如何处理这个逻辑。

InputFile2输入文件2

H0002   Version 9                                                                                                                       
H0003   Date_generated  5-Aug-81                                                                                                                        
H0004   Reporting_period_end_date   09-Jun-99                                                                                                                       
H0005   State   WAAAAA                                                                                                                      
H1000   Tene_no/Combined_rept_no    E79/38975                                                                                                                       
H1001   Tene_holder Magne Resources NL  
D   abc3SCO1    NORM    26  27  9483531 4.15    0.05    0.65    0.02    0.15    0   0.04    0.09    87.51   0.29

Python code and txt file attached here Python 代码和txt文件附在这里

Answer 1

Your code suffers from numerous issues:您的代码存在许多问题：

You don't actually find the position of H1000 .您实际上没有找到H1000的 position 。 I don't see it written in the code.我没有看到它写在代码中。
You set the split to be 5 , disregarding the position of H1000 .您将拆分设置为5 ，忽略H1000的 position 。
I don't understand your range() function.我不明白你的range() function。 You're hopping from start to end in 5 line jumps?你在 5 次跳线中从头跳到尾？
For every jump i , you write everything from the start of the document till i to header.txt and the rest to data.txt .对于每次跳转i ，您将从文档开始到i的所有内容写入header.txt和 rest 到data.txt 。 That means you're writing the entire document multiple times.这意味着您要多次编写整个文档。
You change path_txt to a Path object, but then use it regularly like a string.您将path_txt更改为Path object，然后像字符串一样定期使用它。

I couldn't figure out what to do in case a directory is passed, as having all headers in same file and all data in same file is not what you wish I believe.我不知道在传递目录的情况下该怎么做，因为所有标题都在同一个文件中，所有数据都在同一个文件中，这不是您希望我相信的。

Fixed code for a single file:单个文件的固定代码：

SPLIT_TOKEN = "H1000"

def split_file(path, header_path="header.txt", data_path="data.txt"):
    """Split a file to a header and data file upon encountering a token."""
    header = []
    data = []
    with open(path, "r") as f:
        for line in f:
            if line.startswith(SPLIT_TOKEN):
                break
            header.append(line)
        
        data.append(line)  # Add the line with the token
        data.extend(f)

    with open(header_path, "w") as f:
        f.writelines(header)
    with open(data_path, "w") as f:
        f.writelines(data)

使用 Python 中的第一列值将 txt 文件拆分为两个文件

问题描述

1 个解决方案

解决方案1
0 2022-01-01 04:15:24

使用 Python 中的第一列值将 txt 文件拆分为两个文件

问题描述

1 个解决方案

解决方案1 0 2022-01-01 04:15:24

解决方案1
0 2022-01-01 04:15:24