简体   繁体   English

将带有元组列的 pandas df 导出到 BQ 会引发 pyarrow 错误

[英]Exporting pandas df with column of tuples to BQ throws pyarrow error

I have the following pandas dataframe:我有以下 pandas dataframe:

import pandas as pd
df = pd.DataFrame({"id": [1,2,3], "items": [('a', 'b'), ('a', 'b', 'c'), tuple('d')]}

>print(df)
   id      items
0   1     (a, b)
1   2  (a, b, c)
2   3       (d,)

After registering my GCP/BQ credentials in the normal way...以正常方式注册我的 GCP/BQ 凭据后...

    os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path_to_my_creds.json"

... I try to export it to a BQ table: ...我尝试将其导出到 BQ 表:

import pandas_gbq
pandas_gbq.to_gbq(df, "my_table_name", if_exists="replace")

but I keep getting the following error:但我不断收到以下错误:

Traceback (most recent call last):
  File "<string>", line 4, in <module>
  File "/Users/max.epstein/opt/anaconda3/envs/rec2env/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 1205, in to_gbq
...
File "/Users/max.epstein/opt/anaconda3/envs/rec2env/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 342, in bq_to_arrow_array
    return pyarrow.Array.from_pandas(series, type=arrow_type)
  File "pyarrow/array.pxi", line 915, in pyarrow.lib.Array.from_pandas
  File "pyarrow/array.pxi", line 312, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
  File "pyarrow/error.pxi", line 122, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: Expected bytes, got a 'tuple' object

I have tried converting the tuple column to string with df = df.astype({"items":str}) and adding a table_schema param to the pandas_gbq.to_gbq... line but I keep getting this same error.我尝试使用df = df.astype({"items":str})将元组列转换为字符串,并将table_schema参数添加到pandas_gbq.to_gbq...行,但我一直收到同样的错误。

I have also tried replacing the pandas_gbq.to_gbq... line with the bq_client.load_table_from_dataframe method described here but still get the same pyarrow.lib.ArrowTypeError: Expected bytes, got a 'tuple' object error...我也尝试用此处描述的bq_client.load_table_from_dataframe方法替换pandas_gbq.to_gbq...行,但仍然得到相同的pyarrow.lib.ArrowTypeError: Expected bytes, got a 'tuple' object错误...

So I think this is a weird issue with pandas dtypes being separate from Python types, and the astype only converting the type and not the pandas dtype.所以我认为这是一个奇怪的问题,pandas dtypes 与 Python 类型分开,astype 只转换类型而不是 pandas dtype。 Try also converting the dtype to match the type after the astype statement.还尝试转换 dtype 以匹配astype语句后的类型。

Such that.这样的。

df = df.astype({"items": str})

Is replaced with:被替换为:

df = df.astype({"items": str})
df = df.convert_dtypes()

Let me know if this works.让我知道这个是否奏效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将 pandas 列数据从元组列表转换为字典的字典 - converting pandas column data from list of tuples to dict of dicts 如何插入 ARRAY<string> BQ 存储过程中的列</string> - How to insert into ARRAY<STRING> column in BQ stored Proc Python pip 安装pyarrow错误,无法执行'cmake' - Python pip install pyarrow error, unable to execute 'cmake' Python bigquery lib 错误“pyarrow”没有属性“decimal256” - Python bigquery lib error 'pyarrow' has no attribute 'decimal256' rstudio 中的 bq-auth 身份验证错误,但不在 r 终端 session 中 - Authentication error with bq-auth in rstudio but not within r terminal session Pandas 在 BQ 表中组合多个列以生成用于 FB 转换的有效载荷 api - Pandas combine mutilple columns in a BQ table to generate payload for FB conversions api 计算引擎VM实例CENTOS7中的Bq命令行错误 - Bq command line error in compute engine VM instance CENTOS7 对 BQ 中的重复字段进行分组 - Grouping repeated fields in BQ 如何检查日期是否与另一列中的日期在同一个月内 SQL BQ - how to check if date is within the same month as date in another column SQL BQ 我想对一列求和直到达到负值,然后在 Google BQ 中重新开始求和 - I want to sum a column till a negative value is reached, then start sum all over again in Google BQ
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM