![](/img/trans.png)
[英]How to read csv file into pandas, skipping rows until a certain string, then selecting first row after as header and delimiter as |
[英]Pandas: Read skipping lines that starts with a certain string
我有一个这样的Pandas DataFrame:
[6 rows x 5 columns]
name timestamp value1 state value2
Cs01 1.514483e+09 19.516 0 9.999954e-01
Cs02 1.514483e+09 20.055 0 9.999363e-01
Cs03 1.514483e+09 20.054 0 9.999970e-01
Cs01 1.514483e+09 20.055 0 9.999949e-01
Cs01 1.514483e+09 10.907 0 9.963121e-01
Cs02 1.514483e+09 10.092 0 1.548312e-02
read_csv函数是否可以跳过所有不以名称“ Cs01”开头的行?
谢谢
最简单的是过滤所有行:
df = pd.read_csv('file')
df = df[df['name'].str.startswith('Cs01')]
print (df)
name timestamp value1 state value2
0 Cs01 1.514483e+09 19.516 0 0.999995
3 Cs01 1.514483e+09 20.055 0 0.999995
4 Cs01 1.514483e+09 10.907 0 0.996312
另一个解决方案是在预处理中获取所有不包含Cs01
行,并在read_csv
使用参数skiprows
read_csv
:
exclude = [i for i, line in enumerate(open('file.csv')) if not line.startswith('Cs01')]
print (exclude)
[0, 2, 3, 6]
df = pd.read_csv('file.csv', skiprows = exclude[1:])
print (df)
name timestamp value1 state value2
0 Cs01 1.514483e+09 19.516 0 0.999995
1 Cs01 1.514483e+09 20.055 0 0.999995
2 Cs01 1.514483e+09 10.907 0 0.996312
一种方法是读取文件中的文件,然后将这些行中的行过滤掉,如果您的文件较大且包含很多不需要的行,则可能会更快,因为读取整个df后,过滤可能会变得不正确性能:
In[17]:
t="""name timestamp value1 state value2
Cs01 1.514483e+09 19.516 0 9.999954e-01
Cs02 1.514483e+09 20.055 0 9.999363e-01
Cs03 1.514483e+09 20.054 0 9.999970e-01
Cs01 1.514483e+09 20.055 0 9.999949e-01
Cs01 1.514483e+09 10.907 0 9.963121e-01
Cs02 1.514483e+09 10.092 0 1.548312e-02"""
d = pd.read_csv(io.StringIO(t), delim_whitespace=True, chunksize=2)
dfs = pd.concat([x[x['name'].str.startswith('Cs01')] for x in d])
dfs
Out[17]:
name timestamp value1 state value2
0 Cs01 1.514483e+09 19.516 0 0.999995
3 Cs01 1.514483e+09 20.055 0 0.999995
4 Cs01 1.514483e+09 10.907 0 0.996312
在这里, chunksize
参数指定要读取的行数,您可以将其设置为任意大小,然后对每个块执行列表理解和过滤,然后调用concat
生成单个df
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.