简体   繁体   English

从 pandas dataframe 导入后,字符串在 BiqQuery 表中变为浮点数

[英]String becomes float in BiqQuery table after import from pandas dataframe

I have a pandas dataframe with the following dtypes:我有一个 pandas dataframe 具有以下数据类型:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 579585 entries, 0 to 579613
Data columns (total 7 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   itemName     579585 non-null  object        
 1   itemId       579585 non-null  string        
 2   Count        579585 non-null  int32         
 3   Sales        579585 non-null  float64       
 4   Date         579585 non-null  datetime64[ns]
 5   Unit_margin  579585 non-null  float64       
 6   GrossProfit  579585 non-null  float64       
dtypes: datetime64[ns](1), float64(3), int32(1), object(1), string(1)
memory usage: 33.2+ MB

I upload it to a BigQuery table using:我使用以下方法将其上传到 BigQuery 表:

df_extended_full.to_gbq('<MY DATSET>.profit', project_id='<MY PROJECT>', chunksize=None,  if_exists='append', auth_local_webserver=False, location=None, progress_bar=True)

Everything seem to work well except that the itemId column that is a string becomes a float and that all leading 0:s (which I need) are therefore deleted (wherever there are any).除了作为stringitemId列变成float并且所有前导 0:s (我需要)因此被删除(无论哪里有)之外,一切似乎都运行良好。

I could of course define a schema for my table, but I want to avoid that.我当然可以为我的表定义一个模式,但我想避免这种情况。 What am I missing?我错过了什么?

The problem is for the “to_gbq” component.问题在于“to_gbq”组件。 For some reason this output omits the quotes from the data field.由于某种原因,此 output 省略了数据字段中的引号。 And without quotes, it changes the datatype to a number.如果没有引号,它将数据类型更改为数字。

BigQuery needs this format: BigQuery 需要这种格式:

{"itemId": "12345", "mappingId":"abc123"}

You sent this format:您发送了以下格式:

{"itemId": 12345, "mappingId":abc123}

A solution in this case.在这种情况下的解决方案。 You can cast the field “itemId” from pandas using the command “astype”.您可以使用命令“astype”从 pandas 转换字段“itemId”。 Here is more documentation about this command.这是有关此命令的更多文档

This is an example.这是一个例子。

df['externalId'] = df['externalId'].astype('str')

Another option is to use the parameter table_schema with the to_gbq method .另一种选择是将参数table_schemato_gbq 方法一起使用。 And list the Bigquery table fields which will be according to DataFrame conforms.并列出将根据 DataFrame 符合的 Bigquery 表字段。

[{'name': 'col1', 'type': 'STRING'},...]

Last option, you can change to google-cloud-bigquery instead of pandas-gbq.最后一个选项,您可以更改为 google-cloud-bigquery 而不是 pandas-gbq。 You can see this comparison .你可以看到这个比较

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM