简体   繁体   English

为什么 Pandas read_csv 没有读取所有数据?

[英]Why Pandas read_csv is not reading all the data?

I would like to know if the problem that I have with a particular csv file is a general error from pandas or is something related with the csv file.我想知道我对特定 csv 文件的问题是来自 pandas 的一般错误,还是与 csv 文件有关。 I used pandas read_csv for get the information... but unfortunately pandas, with this function, is not load all the values.我使用 pandas read_csv 来获取信息......但不幸的是,pandas,有了这个 function,没有加载所有的值。 I noticed of this error because i was pretty sure that i have data in it (Particular date 2017/04/01 - 2017/04/02), so i checked the file with excel and, as i thought, the values are there.我注意到了这个错误,因为我很确定我里面有数据(特定日期 2017/04/01 - 2017/04/02),所以我用 excel 检查了文件,并且正如我所想的那样,值在那里。 I save the file as.xlsx and use again pandas forreading but with read_excel and the data load succesfully.我将文件保存为.xlsx 并再次使用 pandas 进行读取,但使用 read_excel 并且数据加载成功。 The most weird at all is that the problem is present only in some dates... without any patron visible, because with read_csv load some information, but no complete.最奇怪的是问题只出现在某些日期......没有任何赞助人可见,因为使用 read_csv 加载一些信息,但不完整。

Is the same file.是同一个文件。 Initially, when processing the data, the file was saved as.csv.最初,在处理数据时,文件被保存为.csv。 Later, with the.csv created a.xlsx since Excel.后来用.csv创建了a.xlsx,自Excel。

csv file: https://drive.google.com/file/d/1VCte8jCu8dB-Qp4KHClZb5cEAUTzZ5lC/view?usp=sharing csv 文件: https://drive.google.com/file/d/1VCte8jCu8dB-Qp4KHClZb5cEAUTzZ5lC/view?usp=sharing

excel file: https://docs.google.com/spreadsheets/d/1p5zJuDhS7PvLwSJMtexRrUHOvC6qexMs/edit?usp=sharing&ouid=112818913372395231829&rtpof=true&sd=true excel file: https://docs.google.com/spreadsheets/d/1p5zJuDhS7PvLwSJMtexRrUHOvC6qexMs/edit?usp=sharing&ouid=112818913372395231829&rtpof=true&sd=true

excel case: excel案例:

resume = pnd.read_excel("/content/gdrive/hcln/h_RiA_0.50_full_time.xlsx", sheet_name = "h_RiA_0.50_full_time (1)", parse_dates = [0])
resume = resume.set_index(["Fecha"])
resume.loc["2017/04/01 23"]
                     h50
Fecha   
2017-04-01 23:00:00 309.0
2017-04-01 23:05:00 287.0
2017-04-01 23:10:00 315.0
2017-04-01 23:15:00 324.0
2017-04-01 23:20:00 325.0
2017-04-01 23:25:00 340.0
2017-04-01 23:30:00 323.0
2017-04-01 23:35:00 330.0
2017-04-01 23:40:00 332.0
2017-04-01 23:45:00 308.0
2017-04-01 23:50:00 319.0
2017-04-01 23:55:00 289.0

csv case: csv 案例:

resume = pnd.read_csv("/content/gdrive/MyDrive/hcln/h_RiA_0.50_full_time.csv", parse_dates = [0])     
resume = resume.set_index(["Fecha"])
resume.loc["2017/04/01 23"]
                    h50
Fecha   
2017-04-01 23:00:00 NaN
2017-04-01 23:05:00 NaN
2017-04-01 23:10:00 NaN
2017-04-01 23:15:00 NaN
2017-04-01 23:20:00 NaN
2017-04-01 23:25:00 NaN
2017-04-01 23:30:00 NaN
2017-04-01 23:35:00 NaN
2017-04-01 23:40:00 NaN
2017-04-01 23:45:00 NaN
2017-04-01 23:50:00 NaN
2017-04-01 23:55:00 NaN

If someone of you could get whats the error, i appreciate it your answer.如果你们中的某个人能得到什么错误,我很感激你的回答。 Here you can get the view that i got in Google Colab.在这里你可以得到我在 Google Colab 中得到的视图。

csv view excel view csv 视图excel 视图

I found the answer.我找到了答案。 Sometime ago, i change the name column of the csv for "h50", in this case, in Excel, in that moment no show any warning message, i supposed that it is not going to affect the containing values.前段时间,我将 csv 的名称列更改为“h50”,在这种情况下,在 Excel 中,在那一刻没有显示任何警告消息,我认为它不会影响包含的值。 Apparently, that's the reason, because, i back run again the process related with ** h_RiA_0.50_full_time.csv ** and fortunately by this way all the values is loading with read_csv.显然,这就是原因,因为我再次运行与 ** h_RiA_0.50_full_time.csv ** 相关的过程,幸运的是,通过这种方式,所有值都使用 read_csv 加载。

I suppose that there is a kind of problem because i made that changes in the column name, in Excel, and for some reason it generates problems with load values.我想存在一种问题,因为我在 Excel 中对列名进行了更改,并且由于某种原因它会产生负载值问题。

It may be something that is totally off the way.这可能是完全不可行的事情。

But usually, whenever I have mounted the drive from Google Colab, I used:但通常,每当我从 Google Colab 安装驱动器时,我都会使用:

/content/drive/MyDrive/Maestría/Tesis/Read_f/hcln/h_RiA_0.50_full_time.xlsx

instead of:代替:

/content/gdrive/MyDrive/Maestría/Tesis/Read_f/hcln/h_RiA_0.50_full_time.xlsx

Another possibility is that the format it reads must be .csv not .xlsx另一种可能是它读取的格式必须是.csv而不是.xlsx

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM