简体   繁体   English

支持Cloud Bigtable作为Cloud Dataflow中的接收器

[英]Support for Cloud Bigtable as Sink in Cloud Dataflow

Are there plans to enable Cloud Dataflow to write data to Cloud Bigtable? 是否有计划使Cloud Dataflow能够将数据写入Cloud Bigtable? Is it even possible? 可能吗?

Adding a custom Sink to handle the IO would probably be the clean choice. 添加自定义Sink以处理IO可能是干净的选择。

As a workaround, I tried connecting to a Bigtable (same project) in a simple DoFn . 解决方法是,我尝试在简单的DoFn连接到Bigtable(相同的项目)。 Opening the connection and table in the startBundle step and closing them in finishBundle . startBundle步骤中打开连接和表,并在finishBundle关闭它们。

Moreover, I added the bigtable-hbase jar (0.1.5) to the classpath and a modified version of hbase-site.xml to the resource folder which gets picked up. 此外,我在类路径中添加了bigtable-hbase jar (0.1.5) ,并在资源文件夹中添加了hbase-site.xml的修改版本。

When running in the cloud, I get a NPN/ALPN extensions not installed exception. 在云中运行时,出现NPN/ALPN extensions not installed异常。

When running locally, I get an exception stating that ComputeEngineCredentials cannot find the metadata server. 在本地运行时,出现异常,指出ComputeEngineCredentials cannot find the metadata server. despite having set the GOOGLE_APPLICATION_CREDENTIALS to the generated json key file. 尽管GOOGLE_APPLICATION_CREDENTIALS设置为生成的json密钥文件。

Any help would be greatly appreciated. 任何帮助将不胜感激。

We now have a Cloud Bigtable / Dataflow connector. 现在,我们有了一个Cloud Bigtable / Dataflow连接器。 You can see more at: https://cloud.google.com/bigtable/docs/dataflow-hbase 您可以在以下网址查看更多信息: https : //cloud.google.com/bigtable/docs/dataflow-hbase

Cloud BigTable requires the NPN/ALPN networking jar. Cloud BigTable需要NPN / ALPN网络罐。 This is currently not installed on the Dataflow workers. 当前未在Dataflow工作器上安装此文件。 So accessing Cloud BigTable directly from a ParDo won't work. 因此,无法直接从ParDo访问Cloud BigTable。

One possible work around is to use the HBase REST API to setup a REST server to access Cloud Bigtable on a VM outside of Dataflow. 一种可能的解决方法是使用HBase REST API设置REST服务器以访问Dataflow之外VM上的Cloud Bigtable。 These instructions might help. 这些说明可能会有所帮助。

You could then issue REST requests to this REST server. 然后,您可以向此REST服务器发出REST请求。 This could be somewhat complicated if your sending a lot of requests (ie processing large amounts of data and need to set up multiple instances of your REST server and load balance across them). 如果您发送大量请求(即处理大量数据并需要设置REST服务器的多个实例并在它们之间进行负载平衡),则这可能会有些复杂。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM