[英]How do I batch upsert data into Google Cloud Spanner using the Python client library?
I would like to upsert the contents of a pandas dataframe into a table in a Google Cloud Spanner database. 我想将熊猫数据框的内容向上插入Google Cloud Spanner数据库的表中。 The documentation here recommends using the
insert_or_update()
method of the batch object. 此处的文档建议使用批处理对象的
insert_or_update()
方法。
If the batch object is created by running this 如果批处理对象是通过运行此命令创建的
from google.cloud import spanner_v1
client = spanner_v1.Client()
batch = client.batch()
Then this object does not have that method available. 然后,该对象没有可用的方法。 Running
dir(client)
gives me these results 运行
dir(client)
给我这些结果
['SCOPE',
'_SET_PROJECT',
'__class__',
'__delattr__',
'__dict__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattribute__',
'__getstate__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__le__',
'__lt__',
'__module__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__',
'__weakref__',
'_credentials',
'_database_admin_api',
'_determine_default',
'_http',
'_http_internal',
'_instance_admin_api',
'_item_to_instance',
'copy',
'credentials',
'database_admin_api',
'from_service_account_json',
'instance',
'instance_admin_api',
'list_instance_configs',
'list_instances',
'project',
'project_name',
'user_agent']
How do I do batch upsert in Spanner? 如何在Spanner中批量上传?
The snippets has an example of batch insert. 摘录中有一个批量插入的示例。 I checked that the batch object created in the snippet also has an insert_or_update field.
我检查了代码段中创建的批处理对象是否也具有insert_or_update字段。
https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/spanner/cloud-client/snippets.py#L72 https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/spanner/cloud-client/snippets.py#L72
[' class ', ' delattr ', ' dict ', ' doc ', ' enter ', ' exit ', ' format ', ' getattribute ', ' hash ', ' init ', ' module ', ' new ', ' reduce ', ' reduce_ex ', ' repr ', ' setattr ', ' sizeof ', ' str ', ' subclasshook ', ' weakref ', '_check_state', '_mutations', '_session', 'commit', 'committed', 'delete', 'insert', 'insert_or_update', 'replace', 'update'] [' class ',' delattr ',' dict ',' doc ',' enter ',' exit ',' format ',' getattribute ',' hash ',' init ',' module ',' new ',' reduce ',' reduce_ex ',' repr ',' setattr ',' sizeof ',' str ',' subclasshook ',' weakref ','_check_state','_mutations','_session','commit','committed' ,“删除”,“插入”,“插入或更新”,“替换”,“更新”]
Can you try that out? 你可以尝试一下吗?
If you have a pandas dataframe, here a random 5 x 3 with columns a,b,c, you can transform the dataframe to column names and the rows and batch insert. 如果您有一个熊猫数据框,这里是一个随机的5 x 3列,其中包含a,b,c列,则可以将数据框转换为列名,行和批处理插入。
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 3)),
columns=['a', 'b', 'c'])
You can insert this into Google Cloud Spanner by extracting the columns and values from df
and batch inserting. 您可以通过从
df
提取列和值并批量插入来将其插入Google Cloud Spanner。
from google.cloud import spanner
spanner_client = spanner.Client()
instance = spanner_client.instance(instance_id)
database = instance.database(database_id)
columns = df.columns
values = df.values.tolist()
with database.batch() as batch:
batch.insert(
table='table',
columns=columns
values=values
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.