简体   繁体   English

Pandas.read_csv无法读取完整标题

[英]Pandas.read_csv not reading full header

I have a csv file that has positions and velocities of particles saved like this: 我有一个csv文件,其中保存了粒子的位置和速度,如下所示:

x, y, z, vx, vy, vz
-0.960, 0.870, -0.490, 962.17, -566.10, 713.40
1.450, 0.777, 2.270, -786.27, 63.31, -441.00
-3.350, -1.640, 1.313, 879.20, 637.76, -556.24
-0.504, 2.970, -0.278, 613.22, -717.32, 557.02
0.338, 0.220, 0.090, -927.18, -778.77, -443.05
...

I'm trying to read this file and save it as a Pandas dataframe in a script with read_csv. 我正在尝试读取此文件,并使用read_csv将其另存为Pandas数据框。 But I would get errors when calling any column except the first one 但是调用除第一列以外的任何列时都会出错

AttributeError: 'DataFrame' object has no attribute 'y' AttributeError:“ DataFrame”对象没有属性“ y”

I would never get the error for the 'x' column, so I wrote a snippet to see if I could figure out where the reading error was stemming from. 我永远不会收到“ x”列的错误,所以我写了一个代码段以查看是否可以找出读取错误的根源。

import pandas as pd
data = pd.read_csv('snap.csv')
print data
print data.x
print data.y

The console correctly prints out 控制台正确打印

          x      y      z       vx       vy       vz       
0    -0.960  0.870 -0.490   962.17  -566.10   713.40   
1     1.450  0.777  2.270  -786.27    63.31  -441.00   
2    -3.350 -1.640  1.313   879.20   637.76  -556.24  
3    -0.504  2.970 -0.278   613.22  -717.32   557.02  
4     0.338  0.220  0.090  -927.18  -778.77  -443.05 
...

meaning it is assigning the columns the correct names. 表示正在为列分配正确的名称。 Then 然后

0      -0.960
1       1.450
2      -3.350
3      -0.504
4       0.338  
...

showing it can take one of the columns out correctly. 显示它可以正确取出其中一列。 But then it throws the error again when trying to print the second column 但是,当尝试打印第二列时,它将再次引发错误

AttributeError: 'DataFrame' object has no attribute 'y' AttributeError:“ DataFrame”对象没有属性“ y”

I then looped through data.itertuples() to print the first row individually in order to see what that looked like, and it confirmed that the names were only being assigned to the first column and none of the others. 然后,我遍历data.itertuples()来单独打印第一行,以查看其外观,并确认名称仅分配给第一列,而没有分配其他名称。

Pandas(Index=0, x=-0.96, _2=0.87, _3=-0.49, _4=962.17, _5=-566.1, _6=713.4)

There aren't any other problems with the data. 数据没有其他问题。 The values all correspond to the right index. 所有值均对应于正确的索引。 It's just that the names are not being assigned correctly and only the first column can be called by name. 只是名称分配不正确,并且只能按名称调用第一列。 I tried putting single quotes around each column name, and that shows the exact same errors. 我尝试在每个列名的两边加上单引号,这显示出完全相同的错误。 I know there are ways I might be able to work around this such as assigning the names in the read_csv function, but I'm curious as to what the issue could actually be so as to avoid having this happen again. 我知道有一些方法可以解决此问题,例如在read_csv函数中分配名称,但是我很好奇问题的实质,以避免再次发生此问题。

创建数据框时,请尝试声明列名称。

df = pd.DataFrame(pd.read_csv(“file.csv”), columns=[“x”, “y”, “z”, “vx”, “vy”, “vz”])
df = pd.read_csv("snap.csv",names =["x", "y", "z", "vx", "vy", "vz"])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM