[英]Python: Split file by line match
I have a text file with different sections I would like to split in separate files . 我有一个文本文件,该文件具有不同的部分,我希望将其拆分为单独的文件。 In the example below split point would be the "Step" lines. 在下面的示例中,分割点为“步骤”行。
Step Number: 1; Plot Name: deg0_R58; Type: Arrow Plot
x(mm),y(mm),z(mm),Bx(T),By(T),Bz(T),Bm(T)
5.505E+01,-1.124E-02,-2.000E+00, 3.443E-04,-1.523E-05, 3.913E-04
5.511E+01,-1.124E-02,-2.000E+00, 3.417E-04,-1.511E-05, 3.912E-04
5.516E+01,-1.124E-02,-2.000E+00, 3.390E-04,-1.499E-05, 3.910E-04
...
Step Number: 2; Plot Name: deg0_R58; Type: Arrow Plot
...
The reason for this is that the pandas function pandas.read_csv()
will not work on the entire file because of the "Step" lines. 其原因是由于“ Step”行,pandas函数pandas.read_csv()
不适用于整个文件。
I only need the files temporarily for the pandas.read_csv()
so I don't actually want to write them. 我只临时需要pandas.read_csv()
的文件,因此实际上我不想写它们。 I've tried slicing the file with itertools.islice
but then I can't process the output with pandas.read_csv
because it needs a file type object. 我尝试用itertools.islice
切片文件,但是后来我无法使用pandas.read_csv
处理输出,因为它需要文件类型对象。
Here is what I've got so far: 这是到目前为止我得到的:
buf = []
with open(filepath, 'r') as f:
for line in f:
if 'Step' in line:
buf.append( [] )
else:
buf[-1].append( line )
Is there a way to get buf
list of lines into a file type format? 有没有办法将buf
的行列表转换成文件类型格式?
-> ->
Thanks for the input, StringIO works great! 感谢您的输入,StringIO很棒! Here's what I made of it just in case anyone is facing a similar problem: 这是我制作的,以防万一有人遇到类似的问题:
steps_Dict= {}
fsection = None
step_nr = 0;
with open( filepath, 'r' ) as f:
print f
for line in f:
if 'Step' in line:
if fsection:
step_nr = step_nr + 1 # Steps start with 1
fsection.seek(0)
steps_Dict[ step_nr ] = pd.read_csv(fsection, sep=',', header=0 )
print steps_Dict
fsection = StringIO.StringIO() # new section
else: # append to section
if line.strip(): # Skip Blank Lines;Alternative with pandas 0.16, pd.read_csv skip_blank_lines=True a parameter could be used ?
fsection.write( line )
if fsection: # captures the last section
fsection.seek(0)
steps_Dict[ step_nr +1] = pd.read_csv( fsection, sep=',', header=0 )
steps_Panel = pd.Panel( steps_Dict )
You can use StringIO to store the string if you don't need to write into a file. 如果不需要写入文件,则可以使用StringIO存储字符串。
import StringIO
output = StringIO.StringIO()
with open(filepath, 'r') as f:
for line in f:
if 'Step' not in line:
output.write(line)
Then you can use Pandas' read_csv
function with output
. 然后,您可以将Pandas的read_csv
函数与output
。
As @Julien pointed out in the comment below. 正如@Julien在下面的评论中指出的那样。 You also need to do output.seek(0)
before reading it with pandas: 您还需要执行output.seek(0)
然后才能使用熊猫阅读它:
import pandas as pd
output.seek(0)
pd.read_csv(output)
You could use the StringIO
module to create a file-like object that can be used by pd.read_csv()
: 您可以使用StringIO
模块创建可由pd.read_csv()
使用的类似文件的对象:
import StringIO
import pandas as pd
astr = StringIO.StringIO()
astr.write('This,is,a,test\n')
astr.write('This,is,another,test\n')
astr.seek(0)
df = pd.read_csv(astr)
You can use the pandas.io.parsers.read_csv
function and skip the lines you don't need or want and read the file directly into a DataFrame
. 您可以使用pandas.io.parsers.read_csv
函数,并跳过不需要或不需要的行,然后将文件直接读取到DataFrame
。
import pandas
z = pandas.io.parsers.read_csv("C:/path/a.txt", skiprows=0, header=1, sep=",")
z
x(mm) y(mm) z(mm) Bx(T) By(T) Bz(T) Bm(T)
0 55.05 -0.01124 -2 0.000344 -0.000015 0.000391 NaN
1 55.11 -0.01124 -2 0.000342 -0.000015 0.000391 NaN
2 55.16 -0.01124 -2 0.000339 -0.000015 0.000391 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.