简体   繁体   English

使用 gspread 中的 gc.open_by_url 跳过导入语句中的第一行(即添加 header=0)

[英]Skip first line in import statement using gc.open_by_url from gspread (i.e. add header=0)

What is the equivalent of header=0 in pandas , which recognises the first line as a heading in gspread ? pandasheader=0的等价物是什么,它将第一行识别为gspread中的标题?

pandas import statement (correct) pandas 进口声明(正确)

import pandas as pd

# gcp / google sheets URL
df_URL = "https://docs.google.com/spreadsheets/d/1wKtvNfWSjPNC1fNmTfUHm7sXiaPyOZMchjzQBt1y_f8/edit?usp=sharing"

raw_dataset = pd.read_csv(df_URL, na_values='?',sep=';'
                          , skipinitialspace=True, header=0, index_col=None)

Using the gspread function, so far I import the data, change the first line to the heading then delete the first line after but this recognises everything in the DataFrame as a string.使用 gspread function,到目前为止,我导入数据,将第一行更改为标题,然后删除之后的第一行,但这会将 DataFrame 中的所有内容识别为字符串。 I would like to recognise the first line as a heading right away in the import statement.我想在导入语句中立即将第一行识别为标题。

gspread import statement that needs header=True equivalent需要 header=True 等效的 gspread import 语句

import pandas as pd
from google.colab import auth
auth.authenticate_user()
import gspread
from oauth2client.client import GoogleCredentials


# gcp / google sheets url
df_URL = "https://docs.google.com/spreadsheets/d/1wKtvNfWSjPNC1fNmTfUHm7sXiaPyOZMchjzQBt1y_f8/edit?usp=sharing"

# importing the data from Google Drive  setup
gc = gspread.authorize(GoogleCredentials.get_application_default())

# read data and put it in dataframe
g_sheets = gc.open_by_url(df_URL) 

df = pd.DataFrame(g_sheets.get_worksheet(0).get_all_values())

  
# change first row to header
df = df.rename(columns=df.iloc[0]) 

# drop first row
df.drop(index=df.index[0], axis=0, inplace=True) 

Looking at the API documentation , you probably want to use:查看API 文档,您可能想使用:

df = pd.DataFrame(g_sheets.get_worksheet(0).get_all_records(head=1))

The .get_all_records method returns a dictionary of with the column headers as the keys and a list of column values as the dictionary values. .get_all_records方法返回一个字典,其中列标题作为键,列值列表作为字典值。 The argument head=<int> determines which row to use as keys;参数head=<int>确定将哪一行用作键; rows start from 1 and follow the numeration of the spreadsheet.行从 1 开始,并遵循电子表格的编号。

Since the values returned by .get_all_records() are lists of strings, the data frame constructor, pd.DataFrame , will return a data frame that is all strings.由于.get_all_records()返回的值是字符串列表,数据框构造函数pd.DataFrame将返回一个全是字符串的数据框。 To convert it to floats, we need to replace the empty strings, and the the dash-only strings ( '-' ) with NA-type values, then convert to float .要将其转换为浮点数,我们需要用 NA 类型值替换空字符串和仅限破折号的字符串 ( '-' ),然后转换为float

Luckily pandas DataFrame has a convenient method for replacing values .replace .幸运的是 pandas DataFrame 有一个方便的方法来替换值.replace We can pass it mapping from the string we want as NAs to None, which gets converted to NaN.我们可以将它从我们想要的字符串映射传递为 NA 到 None,后者被转换为 NaN。

import pandas as pd

data = g_sheets.get_worksheet(0).get_all_records(head=1)

na_strings_map= {
    '-': None, 
    '': None
}

df = pd.DataFrame(data).replace(na_strings_map).astype(float)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python如何使用枚举和列表跳过第一(即零)迭代? - Python how to skip the first (i.e. zero) iteration using enumerate and a list? 不使用任何现成库的Python中的神经网络。 - Neural Networks in Python without using any readymade libraries…i.e., from first principles..help! 我可以从共享文件中导入模块列表吗? 即我可以进口进口吗? - Can I import a list of modules from a shared file? i.e. can I import imports? 使用BeautifulSoup从HTML中提取外国字符(即中文)? - foreign characters (i.e. Chinese) from HTML using BeautifulSoup? 从 URL 解析 JSON 并跳过第一行 Python - Parse JSON from URL and skip first line with Python 地理图,即来自荷兰 - geographical plot i.e. from Netherlands hadoop 文件系统打开文件并跳过第一行 - hadoop filesystem open file and skip first line 使用tweepy和python从tweet获取其他图像URL(即不仅是第一个) - Getting additional image urls (i.e. not just first) from tweet with tweepy and python 如何跳过从标准输入读取的第一行? - How can I skip first line reading from stdin? 如何跳过导入 pandas df 的 CSV 文件中的第一行,但保留其中一个文件的 header? - How can I skip the first line in CSV files imported into a pandas df but keep the header for one of the files?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM