简体   繁体   English

解析制表符分隔的文件

[英]parsing tab separated file

I have a tab separated file ( raw.txt ) which has format like : 我有一个制表符分隔的文件( raw.txt ),其格式如下:

type    A1    A2    A3    A4    ....
params  int   char  char  char  ...
data    1     abc   cde   fgh   ...
type    B1    B2    B3    B4    ....
feature int   char  char  char  ...
data    2     aaa   bbb   ccc   ...   
type    C1    C2    C3    C4    ....
stats   int   int   char  char  ...
data    2     11    aa    bb    ...
data    3     12    cc    cc    ...
data    4     13    dd    dd    ...
data    5     14    ee    ee    ...
...     ...   ...   ...   ...   ...

I want to parse such file and then I wanted to create a folder named of file name and then inside that folder, I want to create different files based on types . 我想解析此类文件,然后创建一个名为文件名的文件夹,然后在该文件夹中,根据types创建其他文件。 Files will be created with data once type observed in the line for example : 一旦在该行中观察到type将使用数据创建文件,例如:

/raw/file1
A1    A2    A3    A4    ....
int   char  char  char  ...
1     abc   cde   fgh   ...

/raw/file2
B1    B2    B3    B4    ....
int   char  char  char  ...
2     aaa   bbb   ccc   ...

/raw/file3
C1    C2    C3    C4    ....
int   int   char  char  ...
2     11    aa    bb    ...
3     12    cc    cc    ...
4     13    dd    dd    ...
5     14    ee    ee    ...
...   ...   ...   ...   ...

and so on... and also i want to create dictionaries like, 等等...我也想创建像这样的字典,

dict1 = {A1:['int', [1]], A2:['char', ['abc']], ...}
dict2 = {B1:['int', [2]], B2:['char', ['aaa']], ...}
dict3 = {C1:['int', [2, 3, 4, 5], C2:['int', [11, 12, 13, 14, ...]], ...}

How can I do that because this is a very big file and parsing such huge file hangs the window and also I am not able to figure out how can I get such output from the file. 我该怎么办,因为这是一个很大的文件,而解析如此大的文件会使窗口挂起,而且我也无法弄清楚如何从文件中获取此类输出。

Here is the code to parse the file and generate the new files. 这是解析文件并生成新文件的代码。 The newfile list temporarily stores the data for each type before writing to file. newfile列表写入文件之前临时存储每种类型的数据。 A line that starts with 'type' will trigger newfile to store results for that type and if there were contents in newfile before it will write them to file. 'type'开头'type'行将触发newfile存储该类型的结果,如果newfile中包含内容,则它将内容写入文件中。 The inc variable is incremented each time you make a call to writefile() and is used as the suffix for the filename. 每次调用writefile()inc变量都会增加,并用作文件名的后缀。 In writefile() I used str.format() to create a dynamic file name and also to write to file left justified with a width of 6 taking arbitrarily many values from linelist . writefile()我使用str.format()创建了一个动态文件名,并且还向左对齐的文件写入了宽度为6的文件,该文件从linelist获取了任意多个值。

def writefile(newfilelist, suffix):
  with open('file{}'.format(suffix), 'w') as f:
    for linelist in newfilelist:
      f.write(('{:<6}'*len(linelist)).format(*linelist) + '\n')

import os
with open('raw.txt') as file:
  os.mkdir('raw')
  os.chdir('raw')
  newfile = []
  inc = 0
  for line in file:
    linelist = line.split()
    if linelist[0] == 'type':
      if newfile:
        inc += 1
        writefile(newfile,inc)
        newfile = []
      newfile.append(linelist[1:])
    else:
      newfile.append(linelist[1:])
  if newfile:
    inc += 1
    writefile(newfile,inc)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM