[英]How to import data into google colab from google drive?
I have some data files uploaded on my google drive.我的谷歌驱动器上上传了一些数据文件。 I want to import those files into google colab.
我想将这些文件导入 google colab。
The REST API method and PyDrive method show how to create a new file and upload it on drive and colab. REST API 方法和 PyDrive 方法展示了如何创建新文件并将其上传到 drive 和 colab。 Using that, I am unable to figure out how to read the data files already present on my drive in my python code.
使用它,我无法弄清楚如何在我的 python 代码中读取驱动器上已经存在的数据文件。
I am a total newbie to this.我完全是新手。 Can someone help me out?
有人可以帮我吗?
(Update April 15 2018: The gspread is frequently being updated, so to ensure stable workflow I specify the version) (2018 年 4 月 15 日更新:gspread 经常更新,所以为了确保稳定的工作流程,我指定了版本)
For spreadsheet file, the basic idea is using packages gspread and pandas to read spreadsheets in Drive and convert them to pandas dataframe format.对于电子表格文件,基本思想是使用包 gspread 和 pandas 来读取 Drive 中的电子表格并将它们转换为 pandas 数据帧格式。
In the Colab notebook:在 Colab 笔记本中:
#install packages
!pip install gspread==2.1.1
!pip install gspread-dataframe==2.1.0
!pip install pandas==0.22.0
#import packages and authorize connection to Google account:
import pandas as pd
import gspread
from gspread_dataframe import get_as_dataframe, set_with_dataframe
from google.colab import auth
auth.authenticate_user() # verify your account to read files which you have access to. Make sure you have permission to read the file!
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default())
Then I know 3 ways to read Google spreadsheets.然后我知道了 3 种阅读 Google 电子表格的方法。
By file name:按文件名:
spreadsheet = gc.open("goal.csv") # Open file using its name. Use this if the file is already anywhere in your drive
sheet = spreadsheet.get_worksheet(0) # 0 means the first sheet in the file
df2 = pd.DataFrame(sheet.get_all_records())
df2.head()
By url:通过网址:
spreadsheet = gc.open_by_url('https://docs.google.com/spreadsheets/d/1LCCzsUTqBEq5pemRNA9EGy62aaeIgye4XxwReYg1Pe4/edit#gid=509368585') # use this when you have the complete url (the edit#gid means permission)
sheet = spreadsheet.get_worksheet(0) # 0 means the first sheet in the file
df2 = pd.DataFrame(sheet.get_all_records())
df2.head()
By file key/ID:按文件键/ID:
spreadsheet = gc.open_by_key('1vpukIbGZfK1IhCLFalBI3JT3aobySanJysv0k5A4oMg') # use this when you have the key (the string in the url following spreadsheet/d/)
sheet = spreadsheet.get_worksheet(0) # 0 means the first sheet in the file
df2 = pd.DataFrame(sheet.get_all_records())
df2.head()
I shared the code above in a Colab notebook: https://drive.google.com/file/d/1cvur-jpIpoEN3vAO8Fd_yVAT5Qgbr4GV/view?usp=sharing我在 Colab 笔记本中分享了上面的代码: https ://drive.google.com/file/d/1cvur-jpIpoEN3vAO8Fd_yVAT5Qgbr4GV/view ? usp = sharing
Source: https://github.com/burnash/gspread来源: https : //github.com/burnash/gspread
!) Set your data to be publicly available then for public spreadsheets: !) 将您的数据设置为公开可用,然后用于公共电子表格:
from StringIO import StringIO # got moved to io in python3.
import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?
key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content
In [10]: df = pd.read_csv(StringIO(data), index_col=0,parse_dates=
['Quradate'])
In [11]: df.head()
More here: Getting Google Spreadsheet CSV into A Pandas Dataframe更多信息:将Google 电子表格 CSV 导入 Pandas 数据框
If private data sort of the same but you will have to do some auth gymnastics...如果私人数据类型相同,但您将不得不进行一些身份验证...
From Google Colab snippets来自 Google Colab 片段
from google.colab import auth
auth.authenticate_user()
import gspread
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default())
worksheet = gc.open('Your spreadsheet name').sheet1
# get_all_values gives a list of rows.
rows = worksheet.get_all_values()
print(rows)
# Convert to a DataFrame and render.
import pandas as pd
pd.DataFrame.from_records(rows)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.