简体   繁体   English

读取数据文件空白区域之间的行并写入新文件

[英]Read lines between empty spaces of data file and write in new files

I have the BIG data text file for example: 我有BIG数据文本文件,例如:

#01textline1
1 2 3 4 5 6
2 3 5 6 7 3
3 5 6 7 6 4
4 6 7 8 9 9

1 2 3 6 4 7
3 5 7 7 8 4
4 6 6 7 8 5

3 4 5 6 7 8
4 6 7 8 8 9
..
..

I want to extract data between empty lines and write it in new files. 我想提取空白行之间的数据并将其写入新文件。 It is hard to know how many empty lines are in file (means you also dont know how many new files you will be writing ; thus it seems very hard to write new files since u dont know how many new files will you be writing. Can anyone guide me? Thank you. I hope my question is clear. 很难知道文件中有多少行(意味着您也不知道将要写入多少个新文件;因此似乎很难写入新文件,因为您不知道您将要写入多少个新文件。可以吗?有人指导我吗?谢谢,我希望我的问题清楚。

Unless your file is very large, split all into individual sections using re, splitting on 2 or more whitespace chars 除非您的文件很大,否则请使用re将所有内容拆分为几个部分,并拆分为2个或更多的空白字符

import re
with open("in.txt") as f:
    lines = re.split("\s{2,}",f.read())
    print lines
['#01textline1\n1 2 3 4 5 6\n2 3 5 6 7 3\n3 5 6 7 6 4\n4 6 7 8 9 9', '1 2 3 6 4 7\n3 5 7 7 8 4\n4 6 6 7 8 5', '3 4 5 6 7 8\n4 6 7 8 8 9']

Just iterate over lines and write your new files each iteration 只需遍历行并在每次迭代中写入新文件

Reading files is not . 读取文件不是 Please choose more appropriate tags... 请选择更多合适的标签...

Splitting a file on empty lines is trivial: 用空行分割文件很简单:

num = 0
out = open("file-0", "w")

for line in open("file"):
    if line == "\n":
      num = num + 1
      out.close()
      out = open("file-"+num, "w")
      continue
    out.write(line)

out.close()

As this approach is reading just one line at a time , file size does not matter. 由于此方法一次只读取一行 ,因此文件大小无关紧要。 It should process data as fast as your disk can handle it, with near-constant memory usage. 它应该以磁盘可以处理的速度处理数据,并且内存使用率几乎恒定。

Perl would have had a neat trick, because you can set the input record separator to two newlines via $/="\\n\\n"; Perl会有一个巧妙的窍门,因为您可以通过$/="\\n\\n";将输入记录分隔符设置为两个换行符$/="\\n\\n"; and then process the data one record at a time as usual... I could not find something similar in python; 然后像往常一样一次一次处理一条记录...我在python中找不到类似的东西; but the hack with "split on empty lines" is not bad either. 但是“分割成空行”的技巧也不错。

Here is a start: 这是一个开始:

with open('in_file') as input_file:
    processing = False
    i = 0
    for line in input_file:
        if line.strip() and not processing:
            out_file = open('output - {}'.format(i), 'w')
            out_file.write(line)
            processing = True
            i += 1
        elif line.strip():
            out_file.write(line)
        else:
            processing = False
            out_file.close()

This code keeps track of whether a file is being currently written to, with the processing flag. 此代码使用processing标志跟踪当前是否正在写入文件。 It resets the flag when it sees a blank line. 看到空白行时,它将重置标志。 The code also creates a new file upon seeing an empty line. 该代码还会在看到空行时创建一个新文件。

Hope it helps. 希望能帮助到你。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM