简体   繁体   English

如何将 pandas float64 类型转换为 NUMERIC Bigquery 类型?

[英]How to convert pandas float64 type to NUMERIC Bigquery type?

I have a panda dataframe df:我有一只熊猫 dataframe df:

<bound method NDFrame.head of                  DAT_RUN          DAT_FORECAST LIB_SOURCE  MES_LONGITUDE  MES_LATITUDE  MES_TEMPERATURE  MES_HUMIDITE  MES_PLUIE  MES_VITESSE_VENT  MES_U_WIND  MES_V_WIND
0   2022-03-29T00:00:00Z  2022-03-29T01:00:00Z    gfs_025          43.50          3.75        11.994824          72.0        0.0          2.653137   -2.402910   -1.124792
1   2022-03-29T00:00:00Z  2022-03-29T01:00:00Z    gfs_025          43.50          4.00        13.094824          74.3        0.0          2.976434   -2.972910   -0.144792
2   2022-03-29T00:00:00Z  2022-03-29T01:00:00Z    gfs_025          43.50          4.25        12.594824          75.3        0.0          3.128418   -2.702910    1.575208
3   2022-03-29T00:00:00Z  2022-03-29T01:00:00Z    gfs_025          43.50          4.50        12.094824          75.5        0.0          3.183418   -2.342910    2.155208

I convert DAT_RUN and DAT_FORECAST columns to datetime format:我将 DAT_RUN 和 DAT_FORECAST 列转换为日期时间格式:

df["DAT_RUN"]           = pd.to_datetime(df['DAT_RUN'],      format="%Y-%m-%dT%H:%M:%SZ") # previously "%Y-%m-%d %H:%M:%S"
df["DAT_FORECAST"]      = pd.to_datetime(df['DAT_FORECAST'], format="%Y-%m-%dT%H:%M:%SZ")

df.dtypes:

DAT_RUN             datetime64[ns]
DAT_FORECAST        datetime64[ns]
LIB_SOURCE                  object
MES_LONGITUDE              float64
MES_LATITUDE               float64
MES_TEMPERATURE            float64
MES_HUMIDITE               float64
MES_PLUIE                  float64
MES_VITESSE_VENT           float64
MES_U_WIND                 float64
MES_V_WIND                 float64

I use bigquery.Client().load_table_from_dataframe() function to insert data into Bigquery table which numeric columns have NUMERIC bigquery table.我使用 bigquery.Client().load_table_from_dataframe() function 将数据插入 Bigquery 表,其中数字列具有 NUMERIC bigquery 表。

It returns this error:它返回此错误:

pyarrow.lib.ArrowInvalid: Got bytestring of length 8 (expected 16)

I tried to fix it with:我试图用以下方法修复它:

df["MES_LONGITUDE"]     = df["MES_LONGITUDE"].astype(str).map(decimal.Decimal)

But no more.但没有了。 Thanks.谢谢。

I managed to work around this issue with a decimal.Context , hope it helps:我设法用decimal.Context解决了这个问题,希望它能有所帮助:

import decimal

import numpy as np
import pandas as pd
from google.cloud import bigquery

df = pd.DataFrame(
    data={
        "MES_HUMIDITE": np.array([2.653137, 2.976434, 3.128418, 3.183418]),
        "MES_PLUIE": np.array([-2.402910, -2.972910, -2.702910, -2.342910]),
    },
    dtype="float",
)

We check data type declaration:我们检查数据类型声明:

df.dtypes
# MES_HUMIDITE    float64
# MES_PLUIE       float64
# dtype: object

Initialize Context to 7 digits, because it is the precision in those columns, you can create multiple Context if you need different precision values for each column:Context初始化为 7 位,因为它是那些列中的精度,如果你需要为每一列提供不同的精度值,你可以创建多个Context

context = decimal.Context(prec=7)
df["MES_HUMIDITE"] = df["MES_HUMIDITE"].apply(context.create_decimal_from_float)
df["MES_PLUIE"] = df["MES_PLUIE"].apply(context.create_decimal_from_float)

Now, each item is a Decimal object:现在,每一项都是十进制 object:

df["MES_HUMIDITE"][0]
# Decimal('2.653137')

Types have changed and Pandas stores Decimals as objects, as I guess is not a native data format:类型已更改,Pandas 将 Decimals 存储为对象,我猜这不是本机数据格式:

df.dtypes
# MES_HUMIDITE    object
# MES_PLUIE       object
# dtype: object
table_id = "test_dataset.test"
job_config = bigquery.LoadJobConfig(
    schema=[
        bigquery.SchemaField("MES_HUMIDITE", "NUMERIC"),
        bigquery.SchemaField("MES_PLUIE", "NUMERIC"),
    ],
    write_disposition="WRITE_TRUNCATE",
)

client = bigquery.Client.from_service_account_json("/path_to_key.json")
job = client.load_table_from_dataframe(df, table_id, job_config=job_config)
job.result()

However, decimal types are generally recommended for financial calculations and, although I do not know your exact case and usage, you are probably safe using FLOAT64 , at least for latitude and longitude .但是,通常建议将decimal 类型用于财务计算,虽然我不知道您的确切情况和用法,但您使用FLOAT64可能是安全的,至少对于latitude 和 longitude是这样。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将 float64 转换为 Dynamo db 的 decimal 数据类型 - Converting float64 to decimal data type for Dynamo db 将 []float64 像素切片转换为图像 - Convert []float64 pixel slice to an Image 在 Bigquery 中使用 FLOAT32 IEEE 754 十六进制表示构建 FLOAT64 - Building FLOAT64 out of the FLOAT32 IEEE 754 hex representation in Bigquery golang 运算符 % 未在 float64 上定义 - golang operator % not defined on float64 如何从字符串中删除“R$”以将其转换为在 BigQuery 中浮动 - How to remove "R$ " from a string to convert it to float in BigQuery 如何从 Java 中的 bigquery 读取字节类型? - How to read bytes type from bigquery in Java? 如何在 BigQuery 中通过 STRUCT 类型的表达式解决组问题 - How to solve group by expressions of type STRUCT in BigQuery BigQuery SQL - 将0改为NULL,将NUMERIC改为STRING类型,PARSE DATETIME(单次查询) - BigQuery SQL - Change 0 to NULL, change NUMERIC to STRING type and PARSE DATETIME (single query) BigQuery 中的 ROW 类型/构造函数 - ROW type/constructor in BigQuery mk 操作中的 BigQuery 错误:读取表时出错...“无法将分区键 &lt; &gt;(类型:TYPE_INT64)添加到架构 - BigQuery error in mk operation: Error while reading table... "failed to add partition key < > (type: TYPE_INT64) to schema
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM