[英]Exporting pandas df with column of tuples to BQ throws pyarrow error
I have the following pandas dataframe:我有以下 pandas dataframe:
import pandas as pd
df = pd.DataFrame({"id": [1,2,3], "items": [('a', 'b'), ('a', 'b', 'c'), tuple('d')]}
>print(df)
id items
0 1 (a, b)
1 2 (a, b, c)
2 3 (d,)
After registering my GCP/BQ credentials in the normal way...以正常方式注册我的 GCP/BQ 凭据后...
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path_to_my_creds.json"
... I try to export it to a BQ table: ...我尝试将其导出到 BQ 表:
import pandas_gbq
pandas_gbq.to_gbq(df, "my_table_name", if_exists="replace")
but I keep getting the following error:但我不断收到以下错误:
Traceback (most recent call last):
File "<string>", line 4, in <module>
File "/Users/max.epstein/opt/anaconda3/envs/rec2env/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 1205, in to_gbq
...
File "/Users/max.epstein/opt/anaconda3/envs/rec2env/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 342, in bq_to_arrow_array
return pyarrow.Array.from_pandas(series, type=arrow_type)
File "pyarrow/array.pxi", line 915, in pyarrow.lib.Array.from_pandas
File "pyarrow/array.pxi", line 312, in pyarrow.lib.array
File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 122, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: Expected bytes, got a 'tuple' object
I have tried converting the tuple column to string with df = df.astype({"items":str})
and adding a table_schema
param to the pandas_gbq.to_gbq...
line but I keep getting this same error.我尝试使用
df = df.astype({"items":str})
将元组列转换为字符串,并将table_schema
参数添加到pandas_gbq.to_gbq...
行,但我一直收到同样的错误。
I have also tried replacing the pandas_gbq.to_gbq...
line with the bq_client.load_table_from_dataframe
method described here but still get the same pyarrow.lib.ArrowTypeError: Expected bytes, got a 'tuple' object
error...我也尝试用此处描述的
bq_client.load_table_from_dataframe
方法替换pandas_gbq.to_gbq...
行,但仍然得到相同的pyarrow.lib.ArrowTypeError: Expected bytes, got a 'tuple' object
错误...
So I think this is a weird issue with pandas dtypes being separate from Python types, and the astype only converting the type and not the pandas dtype.所以我认为这是一个奇怪的问题,pandas dtypes 与 Python 类型分开,astype 只转换类型而不是 pandas dtype。 Try also converting the dtype to match the type after the
astype
statement.还尝试转换 dtype 以匹配
astype
语句后的类型。
Such that.这样的。
df = df.astype({"items": str})
Is replaced with:被替换为:
df = df.astype({"items": str})
df = df.convert_dtypes()
Let me know if this works.让我知道这个是否奏效。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.