[英]String becomes float in BiqQuery table after import from pandas dataframe
I have a pandas dataframe with the following dtypes:我有一个 pandas dataframe 具有以下数据类型:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 579585 entries, 0 to 579613
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 itemName 579585 non-null object
1 itemId 579585 non-null string
2 Count 579585 non-null int32
3 Sales 579585 non-null float64
4 Date 579585 non-null datetime64[ns]
5 Unit_margin 579585 non-null float64
6 GrossProfit 579585 non-null float64
dtypes: datetime64[ns](1), float64(3), int32(1), object(1), string(1)
memory usage: 33.2+ MB
I upload it to a BigQuery table using:我使用以下方法将其上传到 BigQuery 表:
df_extended_full.to_gbq('<MY DATSET>.profit', project_id='<MY PROJECT>', chunksize=None, if_exists='append', auth_local_webserver=False, location=None, progress_bar=True)
Everything seem to work well except that the itemId
column that is a string
becomes a float
and that all leading 0:s (which I need) are therefore deleted (wherever there are any).除了作为
string
的itemId
列变成float
并且所有前导 0:s (我需要)因此被删除(无论哪里有)之外,一切似乎都运行良好。
I could of course define a schema for my table, but I want to avoid that.我当然可以为我的表定义一个模式,但我想避免这种情况。 What am I missing?
我错过了什么?
The problem is for the “to_gbq” component.问题在于“to_gbq”组件。 For some reason this output omits the quotes from the data field.
由于某种原因,此 output 省略了数据字段中的引号。 And without quotes, it changes the datatype to a number.
如果没有引号,它将数据类型更改为数字。
BigQuery needs this format: BigQuery 需要这种格式:
{"itemId": "12345", "mappingId":"abc123"}
You sent this format:您发送了以下格式:
{"itemId": 12345, "mappingId":abc123}
A solution in this case.在这种情况下的解决方案。 You can cast the field “itemId” from pandas using the command “astype”.
您可以使用命令“astype”从 pandas 转换字段“itemId”。 Here is more documentation about this command.
这是有关此命令的更多文档。
This is an example.这是一个例子。
df['externalId'] = df['externalId'].astype('str')
Another option is to use the parameter table_schema
with the to_gbq method .另一种选择是将参数
table_schema
与to_gbq 方法一起使用。 And list the Bigquery table fields which will be according to DataFrame conforms.并列出将根据 DataFrame 符合的 Bigquery 表字段。
[{'name': 'col1', 'type': 'STRING'},...]
Last option, you can change to google-cloud-bigquery instead of pandas-gbq.最后一个选项,您可以更改为 google-cloud-bigquery 而不是 pandas-gbq。 You can see this comparison .
你可以看到这个比较。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.