简体   繁体   English

Python,大型csv文件上的pandas.read_csv,具有来自Google云端硬盘文件的1000万行

[英]Python, pandas.read_csv on large csv file with 10 Million rows from Google Drive file

I extracted a .csv file from Google Bigquery of 2 columns and 10 Million rows. 我从2列和1000万行的Google Bigquery中提取了一个.csv文件。

I have downloaded the file locally as a .csv with the size of 170Mb, then I uploaded the file to Google Drive, and I want to use pandas.read_csv() function to read it into pandas DataFrame in my Jupyter Notebook. 我已将本地文件作为.csv文件下载到本地,大小为170Mb,然后将文件上传到Google云端硬盘,我想使用pandas.read_csv()函数将其读入Jupyter Notebook中的pandas DataFrame中。

Here is the code I used, with specific fileID that I wanna read. 这是我使用的代码,带有要读取的特定fileID。

# read into pandasDF from .csv stored on Google Drive.
follow_network_df = pd.read_csv("https://drive.google.com/uc?export=download&id=1WqHWdgMVLPKVbFzIIprBBhe3I9faq4HA")

Then here is what I got: 这就是我得到的: 在此处输入图片说明

It seems the 170Mb csv file is read as an html link? 看来170Mb的csv文件是作为html链接读取的?

While when I tried the same code with another csv file of 40Mb, it worked perfectly 当我尝试使用40Mb的另一个csv文件尝试相同的代码时,它的运行效果非常好

# another csv file of 40Mb.
user_behavior_df = pd.read_csv("https://drive.google.com/uc?export=download&id=1NT3HZmrrbgUVBz5o6z_JwW5A5vRXOgJo")

在此处输入图片说明

Can anyone give me some hint on the root cause of the difference? 谁能给我一些引起差异的根本原因的提示? Any ideas on how to read a csv file of 10 Million rows and 170Mb from online storage? 关于如何从在线存储中读取1000万行和170Mb的csv文件的任何想法? I know it's possible to just read the 10 Million rows into pandasDF by just using the BigQuery interface or from local machine, but I have to include this as part of my submission, so it's only possible for me to read from online source. 我知道有可能仅通过使用BigQuery接口或从本地计算机就将1000万行读入pandasDF,但是我必须将其作为提交的一部分,因此,我只能从在线资源中进行读取。

The problem is that your first file is too large for Google Drive to scan for viruses, so there's a user prompt that gets displayed instead of the actual file. 问题在于您的第一个文件太大,无法让Google云端硬盘扫描病毒,因此会显示用户提示,而不是实际文件。 You can see this if you access the first file's link. 如果您访问第一个文件的链接,则可以看到此信息。

I'd say click on the user prompt and use the following url with pd.read_csv . 我想点击用户提示,然后将以下网址与pd.read_csv一起pd.read_csv

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM