[英]parsing tab separated file
I have a tab separated file ( raw.txt
) which has format like : 我有一个制表符分隔的文件(
raw.txt
),其格式如下:
type A1 A2 A3 A4 ....
params int char char char ...
data 1 abc cde fgh ...
type B1 B2 B3 B4 ....
feature int char char char ...
data 2 aaa bbb ccc ...
type C1 C2 C3 C4 ....
stats int int char char ...
data 2 11 aa bb ...
data 3 12 cc cc ...
data 4 13 dd dd ...
data 5 14 ee ee ...
... ... ... ... ... ...
I want to parse such file and then I wanted to create a folder named of file name and then inside that folder, I want to create different files based on types
. 我想解析此类文件,然后创建一个名为文件名的文件夹,然后在该文件夹中,根据
types
创建其他文件。 Files will be created with data once type
observed in the line for example : 一旦在该行中观察到
type
将使用数据创建文件,例如:
/raw/file1
A1 A2 A3 A4 ....
int char char char ...
1 abc cde fgh ...
/raw/file2
B1 B2 B3 B4 ....
int char char char ...
2 aaa bbb ccc ...
/raw/file3
C1 C2 C3 C4 ....
int int char char ...
2 11 aa bb ...
3 12 cc cc ...
4 13 dd dd ...
5 14 ee ee ...
... ... ... ... ...
and so on... and also i want to create dictionaries like, 等等...我也想创建像这样的字典,
dict1 = {A1:['int', [1]], A2:['char', ['abc']], ...}
dict2 = {B1:['int', [2]], B2:['char', ['aaa']], ...}
dict3 = {C1:['int', [2, 3, 4, 5], C2:['int', [11, 12, 13, 14, ...]], ...}
How can I do that because this is a very big file and parsing such huge file hangs the window and also I am not able to figure out how can I get such output from the file. 我该怎么办,因为这是一个很大的文件,而解析如此大的文件会使窗口挂起,而且我也无法弄清楚如何从文件中获取此类输出。
Here is the code to parse the file and generate the new files. 这是解析文件并生成新文件的代码。 The
newfile
list temporarily stores the data for each type before writing to file. 在
newfile
列表写入文件之前临时存储每种类型的数据。 A line that starts with 'type'
will trigger newfile to store results for that type and if there were contents in newfile
before it will write them to file. 以
'type'
开头'type'
行将触发newfile存储该类型的结果,如果newfile
中包含内容,则它将内容写入文件中。 The inc
variable is incremented each time you make a call to writefile()
and is used as the suffix for the filename. 每次调用
writefile()
, inc
变量都会增加,并用作文件名的后缀。 In writefile()
I used str.format()
to create a dynamic file name and also to write to file left justified with a width of 6 taking arbitrarily many values from linelist
. 在
writefile()
我使用str.format()
创建了一个动态文件名,并且还向左对齐的文件写入了宽度为6的文件,该文件从linelist
获取了任意多个值。
def writefile(newfilelist, suffix):
with open('file{}'.format(suffix), 'w') as f:
for linelist in newfilelist:
f.write(('{:<6}'*len(linelist)).format(*linelist) + '\n')
import os
with open('raw.txt') as file:
os.mkdir('raw')
os.chdir('raw')
newfile = []
inc = 0
for line in file:
linelist = line.split()
if linelist[0] == 'type':
if newfile:
inc += 1
writefile(newfile,inc)
newfile = []
newfile.append(linelist[1:])
else:
newfile.append(linelist[1:])
if newfile:
inc += 1
writefile(newfile,inc)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.