简体   繁体   English

Python:按行匹配拆分文件

[英]Python: Split file by line match

I have a text file with different sections I would like to split in separate files . 我有一个文本文件,该文件具有不同的部分,我希望将其拆分为单独的文件。 In the example below split point would be the "Step" lines. 在下面的示例中,分割点为“步骤”行。

Step Number: 1; Plot Name: deg0_R58; Type: Arrow Plot 
x(mm),y(mm),z(mm),Bx(T),By(T),Bz(T),Bm(T)
5.505E+01,-1.124E-02,-2.000E+00, 3.443E-04,-1.523E-05, 3.913E-04
5.511E+01,-1.124E-02,-2.000E+00, 3.417E-04,-1.511E-05, 3.912E-04
5.516E+01,-1.124E-02,-2.000E+00, 3.390E-04,-1.499E-05, 3.910E-04
...

Step Number: 2; Plot Name: deg0_R58; Type: Arrow Plot
...

The reason for this is that the pandas function pandas.read_csv() will not work on the entire file because of the "Step" lines. 其原因是由于“ Step”行,pandas函数pandas.read_csv()不适用于整个文件。

I only need the files temporarily for the pandas.read_csv() so I don't actually want to write them. 我只临时需要pandas.read_csv()的文件,因此实际上我不想写它们。 I've tried slicing the file with itertools.islice but then I can't process the output with pandas.read_csv because it needs a file type object. 我尝试用itertools.islice切片文件,但是后来我无法使用pandas.read_csv处理输出,因为它需要文件类型对象。

Here is what I've got so far: 这是到目前为止我得到的:

buf  = []
with open(filepath, 'r') as f:
    for line in f:
            if 'Step' in line:
                buf.append( [] )
            else:
                buf[-1].append( line )

Is there a way to get buf list of lines into a file type format? 有没有办法将buf的行列表转换成文件类型格式?

-> ->

Thanks for the input, StringIO works great! 感谢您的输入,StringIO很棒! Here's what I made of it just in case anyone is facing a similar problem: 这是我制作的,以防万一有人遇到类似的问题:

steps_Dict= {}
fsection = None
step_nr = 0;
with open( filepath, 'r' ) as f:
    print f
    for line in f:
        if 'Step' in line:
            if fsection:
                step_nr = step_nr + 1   # Steps start with 1
                fsection.seek(0)
                steps_Dict[ step_nr ] = pd.read_csv(fsection, sep=',', header=0 )
                print steps_Dict
            fsection = StringIO.StringIO()  # new section
        else:   # append to section
            if line.strip():                                # Skip Blank Lines;Alternative with pandas 0.16, pd.read_csv skip_blank_lines=True a parameter could be used ?
                fsection.write( line )  
    if fsection:    # captures the last section
        fsection.seek(0)
        steps_Dict[ step_nr +1] = pd.read_csv( fsection, sep=',', header=0 )
steps_Panel = pd.Panel( steps_Dict )

You can use StringIO to store the string if you don't need to write into a file. 如果不需要写入文件,则可以使用StringIO存储字符串。

import StringIO

output = StringIO.StringIO()
with open(filepath, 'r') as f:
    for line in f:
        if 'Step' not in line:
            output.write(line)

Then you can use Pandas' read_csv function with output . 然后,您可以将Pandas的read_csv函数与output

As @Julien pointed out in the comment below. 正如@Julien在下面的评论中指出的那样。 You also need to do output.seek(0) before reading it with pandas: 您还需要执行output.seek(0)然后才能使用熊猫阅读它:

import pandas as pd
output.seek(0)
pd.read_csv(output)

You could use the StringIO module to create a file-like object that can be used by pd.read_csv() : 您可以使用StringIO模块创建可由pd.read_csv()使用的类似文件的对象:

import StringIO
import pandas as pd

astr = StringIO.StringIO()
astr.write('This,is,a,test\n')
astr.write('This,is,another,test\n')
astr.seek(0)
df = pd.read_csv(astr)

You can use the pandas.io.parsers.read_csv function and skip the lines you don't need or want and read the file directly into a DataFrame . 您可以使用pandas.io.parsers.read_csv函数,并跳过不需要或不需要的行,然后将文件直接读取到DataFrame

 import pandas
 z = pandas.io.parsers.read_csv("C:/path/a.txt", skiprows=0, header=1, sep=",")
 z

    x(mm)   y(mm)       z(mm)   Bx(T)       By(T)       Bz(T)       Bm(T)
0   55.05   -0.01124    -2      0.000344    -0.000015   0.000391    NaN
1   55.11   -0.01124    -2      0.000342    -0.000015   0.000391    NaN
2   55.16   -0.01124    -2      0.000339    -0.000015   0.000391    NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM