[英]Pandas: ignore all lines following a specific string when reading a file into a DataFrame
I have a pandas DataFrame which can be summarized as this: 我有一个熊猫DataFrame,可以总结为:
[Header]
Some_info = some_info
[Data]
Col1 Col2
0.532 Point
0.234 Point
0.123 Point
1.455 Square
14.64 Square
[Other data]
Other1 Other2
Test1 PASS
Test2 FAIL
My goal is to read only the portion of text between [Data]
and [Other data]
, which is variable (different length). 我的目标是仅读取[Data]
和[Other data]
之间的文本部分,该部分是可变的(不同长度)。 The header has always the same length, so skiprows
from pandas.read_csv
can be used. 标头的长度始终相同,因此可以使用skiprows
的pandas.read_csv
。 However, skipfooter
needs the number of lines to skip, which can change between files. 但是, skipfooter
需要跳过的行数 ,这可以在文件之间改变。
What would be the best solution here? 什么是最好的解决方案? I would like to avoid altering the file externally unless there's no other solution. 除非没有其他解决方案,否则我想避免从外部更改文件。
Numpy's genfromtxt has the ability to take a generator as an input (rather than a file directly) -- the generator can just stop yielding as soon as it hits your footer. Numpy的genfromtxt能够将生成器作为输入(而不是直接作为文件)-生成器只要打到页脚,就可以立即停止屈服。 The resulting structured array could be converted to a pandas DataFrame. 生成的结构化数组可以转换为pandas DataFrame。 It's not ideal, but it didn't look like pandas' read_csv could take the generator directly. 这并不理想,但是看起来熊猫的read_csv不能直接使用生成器。
import numpy as np
import pandas as pd
def skip_variable_footer(infile):
for line in infile:
if line.startswith('[Other data]'):
raise StopIteration
else:
yield line
with open(filename, 'r') as infile:
data = np.genfromtxt(skip_variable_footer(infile), delimiter=',', names=True, dtype=None)
df = pd.DataFrame(data)
This method has to run over the file twice. 此方法必须对文件运行两次。
import itertools as it
def get_footer(file_):
with open(file_) as f:
g = it.dropwhile(lambda x: x != '[Other data]\n', f)
footer_len = len([i for i, _ in enumerate(g)])
return footer_len
footer_len = get_footer('file.txt')
df = pd.read_csv('file.txt', … skipfooter=footer_len)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.