I would like to split a INPUT.txt file into two.txt files(Header & Data) by the value of the first column. Data before "H1000" will save in a header.txt file and after/equal to "H1000" will save in data.txt file.
INPUT.txt
H0002 Version 78
H0003 Date_generated 5-Aug-81
H0004 Reporting_period_end_date 09-Jun-81
H1000 State WAAAA
H1002 Teno/Combno Z70/4000
H1003 Tener Magn Reso NL
H1004 LLD
D AC056SCO1 NRM 11 12 6483516 25.98 0.4 1.35 0.25 0.51 0.01 0.06 0.1 56.23 2.29
With the output files being:
header.txt
H0002 Version 78
H0003 Date_generated 5-Aug-81
H0004 Reporting_period_end_date 09-Jun-81
data.txt
H1000 State WAAAA
H1002 Teno/Combno Z70/4000
H1003 Tener Magn Reso NL
H1004 LLD
D AC056SCO1 NRM 11 12 6483516 25.98 0.4 1.35 0.25 0.51 0.01 0.06 0.1 56.23 2.29
Couple of problem that I am facing:
"H1000" position is dynamic in different txt files. If you see another input file see "H1000" position is different(Check Input File2 ). So my python code is first finding the position of H1000.
I am using the position of H1000 for separating Header & Data file. Logic is not working correctly in separating the files.
My python code:
if path_txt.is_file():
txt_files = [Path(path_txt)]
else:
txt_files = list(Path(path_txt).glob("*.txt"))
for fn in txt_files:
with open(fn) as fd_read:
for line in fd_read:
h_value = line.split(maxsplit=1)[0]
value = int(h_value[1:]) #Finding the position of H1000
splitLen = 5 # Position of H1000
HeaderBase = 'Header.txt' # Header.txt
DataBase = 'Data.txt' # Data.txt
with open(fn, 'r') as fp:
input_list = fp.readlines()
# to skip empties: input_list = [l for l in fp if l.strip()]
for i in range(0, len(input_list), splitLen):
with open(HeaderBase, 'w') as fp:
fp.write(''.join(input_list[0:(i-1)])) #Header.txt
with open(DataBase, 'w') as fp:
fp.write(''.join(input_list[i:])) #Data.txt
None of my logic is working. Any help as I have stuck how to work this logic.
InputFile2
H0002 Version 9
H0003 Date_generated 5-Aug-81
H0004 Reporting_period_end_date 09-Jun-99
H0005 State WAAAAA
H1000 Tene_no/Combined_rept_no E79/38975
H1001 Tene_holder Magne Resources NL
D abc3SCO1 NORM 26 27 9483531 4.15 0.05 0.65 0.02 0.15 0 0.04 0.09 87.51 0.29
Python code and txt file attached here
Your code suffers from numerous issues:
H1000
. I don't see it written in the code.5
, disregarding the position of H1000
.range()
function. You're hopping from start to end in 5 line jumps?i
, you write everything from the start of the document till i
to header.txt
and the rest to data.txt
. That means you're writing the entire document multiple times.path_txt
to a Path
object, but then use it regularly like a string.I couldn't figure out what to do in case a directory is passed, as having all headers in same file and all data in same file is not what you wish I believe.
Fixed code for a single file:
SPLIT_TOKEN = "H1000"
def split_file(path, header_path="header.txt", data_path="data.txt"):
"""Split a file to a header and data file upon encountering a token."""
header = []
data = []
with open(path, "r") as f:
for line in f:
if line.startswith(SPLIT_TOKEN):
break
header.append(line)
data.append(line) # Add the line with the token
data.extend(f)
with open(header_path, "w") as f:
f.writelines(header)
with open(data_path, "w") as f:
f.writelines(data)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.