如何使用Python通过Cloud Dataflow将CSV文件导入Cloud Bigtable？

Question

The easiest way to describe what I'm doing is essentially to follow this tutorial: Import a CSV file into a Cloud Bigtable table , but in the section where they start the Dataflow job, they use Java: 描述我正在做什么的最简单方法实质上是遵循本教程：将CSV文件导入Cloud Bigtable表中，但是在他们开始Dataflow作业的部分中，他们使用Java：

mvn package exec:exec \
    -DCsvImport \
    -Dbigtable.projectID=YOUR_PROJECT_ID \
    -Dbigtable.instanceID=YOUR_INSTANCE_ID \
    -Dbigtable.table="YOUR_TABLE_ID" \
    -DinputFile="YOUR_FILE" \
    -Dheaders="YOUR_HEADERS"

Is there a way to do this particular step in python? 有没有办法在python中执行此特定步骤？ The closest I could find was the apache_beam.examples.wordcount example here , but ultimately I'd like to see some code where I can add some customization into the Dataflow job using Python. 我能找到的最接近的是apache_beam.examples.wordcount例子在这里，但最终我想看到一些代码，我可以添加一些定制成使用Python的数据流任务。

Answer 1

有一个用于写入Cloud Bigtable的连接器，您可以将其用作导入CSV文件的起点。

Answer 2

Google Dataflow does not have a Python connector for BigTable. Google Dataflow没有适用于BigTable的Python连接器。

Here is a link to the Apache Beam connectors for both Java and Python: 这是指向Java和Python的Apache Beam连接器的链接：

Built-in I/O Transforms 内置I / O转换

Answer 3

I'd suggest doing something like this. 我建议做这样的事情。

DataFrame.to_gbq(destination_table, project_id, chunksize=10000, verbose=True, reauth=False, if_exists='fail', private_key=None)

You will find all parameters, and explanations of each, in the link below. 您可以在下面的链接中找到所有参数以及每个参数的说明。

https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.to_gbq.html#pandas.DataFrame.to_gbq https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.to_gbq.html#pandas.DataFrame.to_gbq

如何使用Python通过Cloud Dataflow将CSV文件导入Cloud Bigtable？

问题描述

3 个解决方案

解决方案1
3 2019-03-13 22:22:27

解决方案2
0 2019-03-09 05:39:09

解决方案3
-3 2019-03-11 04:57:51

如何使用Python通过Cloud Dataflow将CSV文件导入Cloud Bigtable？

问题描述

3 个解决方案

解决方案1 3 2019-03-13 22:22:27

解决方案2 0 2019-03-09 05:39:09

解决方案3 -3 2019-03-11 04:57:51

解决方案1
3 2019-03-13 22:22:27

解决方案2
0 2019-03-09 05:39:09

解决方案3
-3 2019-03-11 04:57:51