简体   繁体   English

熊猫读csv正在改变列

[英]Pandas read csv is shifting columns

I'm trying to create a dataframe of a csv file that has 4 empty columns. 我正在尝试创建一个包含4个空列的csv文件的数据框。 When I open it on LibreOffice or Excel it correctly identifies the empty columns. 当我在LibreOffice或Excel上打开它时,它正确识别空列。 However, opening with pd.read_csv() ends up shifting the columns' values by one. 但是,使用pd.read_csv()打开pd.read_csv()列的值移动一个。

How can I solve this? 我怎么解决这个问题? It seems like a problem with pandas read_csv() method. 这似乎是pandas read_csv()方法的一个问题。

My code is really standard: 我的代码非常标准:

import pandas as pd
df = pd.DataFrame.read_csv('csv_file.csv', sep=',')
df.head()

I changed the headers and used this: 我改变了标题并使用了这个:

df = pd.DataFrame.read_csv('csv_file.csv', sep=',', index_col=False).

This solved the problem, but what in my previous headers was causing this? 这解决了这个问题,但是我之前的标题中是什么导致了这个问题?

It seems you need the parameter index_col=False to NOT read the first column to index in read_csv , sep=',' parameter can be omitted, because it is the default value: 看来你需要参数index_col=False来读取read_csv索引的第一列, sep=','参数可以省略,因为它是默认值:

df = pd.read_csv('csv_file.csv', index_col=False)

Your sample: 你的样本:

df = pd.read_csv('teste2.csv', index_col=False)
print (df)
  Header1 Header2  Header3  Unnamed: 3  Unnamed: 4  Header4  Header5  Header6  \
0     ptn  M00001        0         NaN         NaN        2        0        0   

   Header7  Header8    ...     Header22  Header23  Header24  Header25  \
0        0  -31.573    ...       -0.375       0.0   -64.168   276.586   

   Header26  Header27  Unnamed: 29  Unnamed: 30  Header28  Header29  
0    -0.232       0.0          NaN          NaN     0.702       1.0  

[1 rows x 33 columns]

I encountered the same problem. 我遇到了同样的问题。 Try writing headings on top of each column if there are none. 如果没有,请尝试在每列的顶部写标题。 This time, read_csv() also reads the headings and lists them. 这次, read_csv()还会读取标题并列出它们。
After that convert the data frame to an array by 之后,将数据帧转换为数组

df=df.values 

and the headings are gone. 标题消失了。

The problems occurs if your line ends with an delimiter (here comma[,]), which creates an empty cell generally not visible in MS Excel. 如果您的行以分隔符(此处为逗号[,])结束,则会出现问题,这会创建一个通常在MS Excel中不可见的空单元格。 If your csv line looks like this: 如果你的csv行看起来像这样:

1,2282816,102.97245065789474,2432,0.8333333333333334,0.1388888888888889,certain,

then modify it to: 然后将其修改为:

1,2282816,102.97245065789474,2432,0.8333333333333334,0.1388888888888889,certain

and pd.read_csv(fileName) will work fine. pd.read_csv(fileName)将正常工作。

I had a similar problem. 我遇到了类似的问题。 Here is how I have solved it: 以下是我如何解决它:

  1. Opened excel file with google spreadsheet on google drive 在谷歌驱动器上使用谷歌电子表格打开excel文件
  2. Downloaded spread sheet as csv file 下载的电子表格作为csv文件
  3. Read the csv file via pandas.read_csv('filename', sep=',', index_col=False)) 通过pandas.read_csv('filename', sep=',', index_col=False))读取csv文件

Problem resolved. 问题解决了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM