将Int64类型的Pandas Dataframe发送到GCP Spanner INT64列

Question

I am using Pandas Dataframes. 我正在使用Pandas Dataframes。 I have a column from a CSV which is integers mixed in with nulls. 我有一个来自CSV的列，它是整数与null混合在一起。

I am trying to convert this and insert it into Spanner in as generalizable a way as possible(so I can use the same code for future jobs), which reduces my ability to use sentinel variables. 我正在尝试将其转换并以尽可能通用的方式将其插入Spanner（以便将来的工作可以使用相同的代码），这降低了我使用前哨变量的能力。 However, DFs cannot handle NaN s in a pure int column so you have to use Int64 . 但是，DF无法处理纯int列中的NaN ，因此您必须使用Int64 。 When I try to insert this into Spanner I get an error that it is not an int64 type, whereas pure Python int s do work. 当我尝试将其插入Spanner时，我得到一个错误，它不是int64类型，而纯Python int可以工作。 Is there an automatic way to convert Int64 Pandas values to int values during the insert? 有没有一种自动的方法在插入过程中将Int64 Pandas值转换为int值？ Converting the column before inserting doesn't work, again, because of the null values. 再次，由于空值，在插入之前转换列不起作用。 Is there another path around this? 有其他解决方法吗？

Trying to convert from a Series goes like so: 尝试从系列中进行转换是这样的：

>>>s2=pd.Series([3.0,5.0])
>>>s2
0    3.0
1    5.0
dtype: float64
>>>s1=pd.Series([3.0,None])
>>>s1
0    3.0
1    NaN
dtype: float64
>>>df = pd.DataFrame(data=[s1,s2], dtype=np.int64)
>>>df
   0    1
0  3  NaN
1  3  5.0
>>>df = pd.DataFrame(data={"nullable": s1, "nonnullable": s2}, dtype=np.int64)

this last command produces the error ValueError: Cannot convert non-finite values (NA or inf) to integer 这最后一条命令产生错误ValueError: Cannot convert non-finite values (NA or inf) to integer

Answer 1

I was unable to reproduce your issue but it seems everyone works as expected 我无法重现您的问题，但似乎每个人都按预期工作

Is it possible you have a non-nullable column that you are writing null values to? 您是否有向其写入空值的不可为空的列？

Retrieving the schema of a Spanner table 检索Spanner表的架构

from google.cloud import spanner

client = spanner.Client()
database = client.instance('testinstance').database('testdatabase')
table_name='inttable'

query = f'''
SELECT
t.column_name,
t.spanner_type,
t.is_nullable
FROM
information_schema.columns AS t
WHERE
t.table_name = '{table_name}'
'''

with database.snapshot() as snapshot:
    print(list(snapshot.execute_sql(query)))
    # [['nonnullable', 'INT64', 'NO'], ['nullable', 'INT64', 'YES']]

Inserting to spanner from a Pandas dataframe 从熊猫数据框插入到扳手

from google.cloud import spanner

import numpy as np
import pandas as pd

client = spanner.Client()
instance = client.instance('testinstance')
database = instance.database('testdatabase')


def insert(df):
    with database.batch() as batch:
        batch.insert(
            table='inttable',
            columns=(
                'nonnullable', 'nullable'),
            values=df.values.tolist()
        )

print("Succeeds in inserting int rows.")
d = {'nonnullable': [1, 2], 'nullable': [3, 4]}
df = pd.DataFrame(data=d, dtype=np.int64)
insert(df)

print("Succeeds in inserting rows with None in nullable columns.")
d = {'nonnullable': [3, 4], 'nullable': [None, 6]}
df = pd.DataFrame(data=d, dtype=np.int64)
insert(df)

print("Fails (as expected) attempting to insert row with None in a nonnullable column fails as expected")
d = {'nonnullable': [5, None], 'nullable': [6, 0]}
df = pd.DataFrame(data=d, dtype=np.int64)
insert(df)
# Fails with "google.api_core.exceptions.FailedPrecondition: 400 nonnullable must not be NULL in table inttable."

Answer 2

My solution was to leave it as NaN (it turns out NaN == 'nan' ). 我的解决方案是将其保留为NaN （原来是NaN == 'nan' ）。 Then, at the very end, as I went to insert into the Spanner DB, I replaced all NaN with None in the DF. 然后，最后，当我插入Spanner DB时，我在DF中将所有NaN替换为None 。 I used code from another SO answer: df.replace({pd.np.nan: None}) . 我使用了另一个SO答案中的代码： df.replace({pd.np.nan: None}) 。 Spanner was looking at the NaN as a 'nan' string and rejecting that for insertion into an Int64 column. Spanner将NaN视为'nan'字符串，并拒绝将其插入Int64列。 None is treated as NULL and can get inserted into Spanner with no issue. None视为NULL ，可以毫无问题地将其插入Spanner。

将Int64类型的Pandas Dataframe发送到GCP Spanner INT64列

问题描述

2 个解决方案

解决方案1
0 2019-03-26 16:48:49

Retrieving the schema of a Spanner table 检索Spanner表的架构

Inserting to spanner from a Pandas dataframe 从熊猫数据框插入到扳手

解决方案2
0 已采纳 2019-03-27 17:29:15

将Int64类型的Pandas Dataframe发送到GCP Spanner INT64列

问题描述

2 个解决方案

解决方案1 0 2019-03-26 16:48:49

Retrieving the schema of a Spanner table 检索Spanner表的架构

Inserting to spanner from a Pandas dataframe 从熊猫数据框插入到扳手

解决方案2 0 已采纳 2019-03-27 17:29:15

解决方案1
0 2019-03-26 16:48:49

解决方案2
0 已采纳 2019-03-27 17:29:15