简体   繁体   English

Apache 从 Pub/Sub 到 ElasticSearch 的光束流式传输

[英]Apache Beam Streaming from Pub/Sub to ElasticSearch

I'm writing a java streaming pipeline with Apache Beam that reads messages from Google Cloud PubSub and should write them into an ElasticSearch instance.我正在使用 Apache Beam 编写 java 流管道,它从 Google Cloud PubSub 读取消息并将它们写入 ElasticSearch 实例。 Currently, I'm using the direct runner, but the plan is to deploy the solution on Google Cloud Dataflow.目前,我正在使用直接运行器,但计划是在 Google Cloud Dataflow 上部署解决方案。

First of all, I wrote a pipeline that reads from PubSub and writes to text files and it works.首先,我编写了一个从 PubSub 读取并写入文本文件的管道,它可以工作。 Then, I sat up the ElasticSearch instance and also this works.然后,我坐了 ElasticSearch 实例,这也有效。 I wrote some documents with curl and it was easy.我用 curl 写了一些文档,这很容易。

Then, when I tried to perform the write with Beam's ElasticSearch connector, I started to get some error.然后,当我尝试使用 Beam 的 ElasticSearch 连接器执行写入时,我开始遇到一些错误。 Actually, I get ava.lang.NoSuchMethodError: org.elasticsearch.client.RestClient.performRequest , in spite of the fact that I added the dependency on my pom.xml file.实际上,我得到ava.lang.NoSuchMethodError: org.elasticsearch.client.RestClient.performRequest ,尽管我添加了对我的 pom.xml 文件的依赖。

What I'm doing is essentially this:我正在做的基本上是这样的:

messages.apply(
                        "TwoMinWindow",
                        Window.into(FixedWindows.of(new Duration(120*1000)))
                ).apply(
                        "ElasticWrite",
            ElasticsearchIO.write()
            .withConnectionConfiguration(
                             ElasticsearchIO.ConnectionConfiguration
                             .create(new String[]{"http://xxx.xxx.xxx.xxx:9200"}, "streaming_data", "string")
                             .withUsername("xxxx")
                             .withPassword("xxxxxxxx")
                             )
                );

Using the DirectRunner, I'm able to connect to PubSub, but I get an exception when the pipeline tries to connect with the ElasticSearch instance:使用 DirectRunner,我可以连接到 PubSub,但是当管道尝试连接 ElasticSearch 实例时出现异常:

java.lang.NoSuchMethodError: org.elasticsearch.client.RestClient.performRequest(Ljava/lang/String;Ljava/lang/String;[Lorg/apache/http/Header;)Lorg/elasticsearch/client/Response;
    at org.apache.beam.sdk.util.UserCodeException.wrap (UserCodeException.java:34)
    at org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$Write$WriteFn$DoFnInvoker.invokeSetup (Unknown Source)
    at org.apache.beam.sdk.transforms.reflect.DoFnInvokers.tryInvokeSetupFor (DoFnInvokers.java:50)
    at org.apache.beam.runners.direct.DoFnLifecycleManager$DeserializingCacheLoader.load (DoFnLifecycleManager.java:104)
    at org.apache.beam.runners.direct.DoFnLifecycleManager$DeserializingCacheLoader.load (DoFnLifecycleManager.java:91)
    at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture (LocalCache.java:3528)
    at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync (LocalCache.java:2277)
    at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad (LocalCache.java:2154)
    at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get (LocalCache.java:2044)
    at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get (LocalCache.java:3952)
    at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad (LocalCache.java:3974)
    at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get (LocalCache.java:4958)
    at org.apache.beam.runners.direct.DoFnLifecycleManager.get (DoFnLifecycleManager.java:61)
    at org.apache.beam.runners.direct.ParDoEvaluatorFactory.createEvaluator (ParDoEvaluatorFactory.java:129)
    at org.apache.beam.runners.direct.ParDoEvaluatorFactory.forApplication (ParDoEvaluatorFactory.java:79)
    at org.apache.beam.runners.direct.TransformEvaluatorRegistry.forApplication (TransformEvaluatorRegistry.java:169)
    at org.apache.beam.runners.direct.DirectTransformExecutor.run (DirectTransformExecutor.java:117)
    at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:511)
    at java.util.concurrent.FutureTask.run (FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624)
    at java.lang.Thread.run (Thread.java:748)
Caused by: java.lang.NoSuchMethodError: org.elasticsearch.client.RestClient.performRequest(Ljava/lang/String;Ljava/lang/String;[Lorg/apache/http/Header;)Lorg/elasticsearch/client/Response;
    at org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.getBackendVersion (ElasticsearchIO.java:1348)
    at org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$Write$WriteFn.setup (ElasticsearchIO.java:1200)

What I added in the pom.xml is:我在 pom.xml 中添加的是:

    <dependency>
    <groupId>org.apache.beam</groupId>
    <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
    <version>${beam.version}</version>
  </dependency>

    <!-- https://mvnrepository.com/artifact/org.elasticsearch.client/elasticsearch-rest-client -->
  <dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-client</artifactId>
    <version>${elastic.version}</version>
  </dependency>

I'm stuck with this problem and I don't know how to solve it.我被这个问题困住了,我不知道如何解决它。 If I use a JestClient, I'm able to connect to ElasticSearch without any issue.如果我使用 JestClient,我可以毫无问题地连接到 ElasticSearch。

Have you any suggestion?你有什么建议吗?

You are using a newer version of RestClient that does not have the method performRequest(String, Header) .您正在使用没有方法performRequest(String, Header)的较新版本的RestClient If you look at the latest source code , you can see that the method takes a Request now, whereas in older versions there were methods that took Strings and Headers .如果您查看最新的源代码,您可以看到该方法现在接受一个Request ,而在旧版本中,有一些方法接受 Strings 和 Headers These methods were deprecated and then removed from the code on September 1, 2018 .这些方法已被弃用,然后于 2018 年 9 月 1 日从代码中删除

Either change your code to use the newer Elastic Search library, or specify an older version of the library (it needs to be before 7.0.x , eg 6.8.4 ) that is compatible with your code.更改您的代码以使用较新的 Elastic Search 库,或指定与您的代码兼容的旧版本的库(它需要在7.0.x之前,例如6.8.4 )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 结合 BigQuery 和 Pub/Sub Apache Beam - Combine BigQuery and Pub/Sub Apache Beam Apache Beam-BigQuery-Google Pub / Sub批处理 - Apache Beam - BigQuery - Google Pub/Sub Batch 使用 Apache Beam 从 PubSubIO 获取 Pub/Sub 消息的 messageId 字段 - Get Pub/Sub message's messageId field from PubSubIO with Apache Beam DataFlow (Apache Beam) 中发布/订阅的自定义时间戳和窗口 - Custom timestamp and windowing for Pub/Sub in DataFlow (Apache Beam) Apache Beam:无法通过docker-compose访问发布/订阅模拟器 - Apache Beam : cannot access Pub/Sub Emulator via docker-compose DirectRunner无法以我在Beam Java SDK中使用FixedWindows指定的方式从发布/订阅中读取 - DirectRunner does not read from Pub/Sub the way I specified with FixedWindows in Beam Java SDK 缓冲和刷新 Apache Beam 流数据 - Buffer and flush Apache Beam streaming data 如何使用 Apache Beam 中的流输入 PCollection 请求 Redis 服务器? - How to request Redis server using a streaming input PCollection in Apache Beam? Flink runner 上的 Beam:ClassNotFoundException:org.apache.beam.runners.flink.translation.wrappers.streaming.WorkItemKeySelector - Beam on Flink runner: ClassNotFoundException: org.apache.beam.runners.flink.translation.wrappers.streaming.WorkItemKeySelector Google Appengine对pub sub的请求出错 - Error in request from Google Appengine to pub sub
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM