简体   繁体   中英

Apache Beam Streaming from Pub/Sub to ElasticSearch

I'm writing a java streaming pipeline with Apache Beam that reads messages from Google Cloud PubSub and should write them into an ElasticSearch instance. Currently, I'm using the direct runner, but the plan is to deploy the solution on Google Cloud Dataflow.

First of all, I wrote a pipeline that reads from PubSub and writes to text files and it works. Then, I sat up the ElasticSearch instance and also this works. I wrote some documents with curl and it was easy.

Then, when I tried to perform the write with Beam's ElasticSearch connector, I started to get some error. Actually, I get ava.lang.NoSuchMethodError: org.elasticsearch.client.RestClient.performRequest , in spite of the fact that I added the dependency on my pom.xml file.

What I'm doing is essentially this:

messages.apply(
                        "TwoMinWindow",
                        Window.into(FixedWindows.of(new Duration(120*1000)))
                ).apply(
                        "ElasticWrite",
            ElasticsearchIO.write()
            .withConnectionConfiguration(
                             ElasticsearchIO.ConnectionConfiguration
                             .create(new String[]{"http://xxx.xxx.xxx.xxx:9200"}, "streaming_data", "string")
                             .withUsername("xxxx")
                             .withPassword("xxxxxxxx")
                             )
                );

Using the DirectRunner, I'm able to connect to PubSub, but I get an exception when the pipeline tries to connect with the ElasticSearch instance:

java.lang.NoSuchMethodError: org.elasticsearch.client.RestClient.performRequest(Ljava/lang/String;Ljava/lang/String;[Lorg/apache/http/Header;)Lorg/elasticsearch/client/Response;
    at org.apache.beam.sdk.util.UserCodeException.wrap (UserCodeException.java:34)
    at org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$Write$WriteFn$DoFnInvoker.invokeSetup (Unknown Source)
    at org.apache.beam.sdk.transforms.reflect.DoFnInvokers.tryInvokeSetupFor (DoFnInvokers.java:50)
    at org.apache.beam.runners.direct.DoFnLifecycleManager$DeserializingCacheLoader.load (DoFnLifecycleManager.java:104)
    at org.apache.beam.runners.direct.DoFnLifecycleManager$DeserializingCacheLoader.load (DoFnLifecycleManager.java:91)
    at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture (LocalCache.java:3528)
    at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync (LocalCache.java:2277)
    at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad (LocalCache.java:2154)
    at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get (LocalCache.java:2044)
    at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get (LocalCache.java:3952)
    at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad (LocalCache.java:3974)
    at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get (LocalCache.java:4958)
    at org.apache.beam.runners.direct.DoFnLifecycleManager.get (DoFnLifecycleManager.java:61)
    at org.apache.beam.runners.direct.ParDoEvaluatorFactory.createEvaluator (ParDoEvaluatorFactory.java:129)
    at org.apache.beam.runners.direct.ParDoEvaluatorFactory.forApplication (ParDoEvaluatorFactory.java:79)
    at org.apache.beam.runners.direct.TransformEvaluatorRegistry.forApplication (TransformEvaluatorRegistry.java:169)
    at org.apache.beam.runners.direct.DirectTransformExecutor.run (DirectTransformExecutor.java:117)
    at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:511)
    at java.util.concurrent.FutureTask.run (FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624)
    at java.lang.Thread.run (Thread.java:748)
Caused by: java.lang.NoSuchMethodError: org.elasticsearch.client.RestClient.performRequest(Ljava/lang/String;Ljava/lang/String;[Lorg/apache/http/Header;)Lorg/elasticsearch/client/Response;
    at org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO.getBackendVersion (ElasticsearchIO.java:1348)
    at org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$Write$WriteFn.setup (ElasticsearchIO.java:1200)

What I added in the pom.xml is:

    <dependency>
    <groupId>org.apache.beam</groupId>
    <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
    <version>${beam.version}</version>
  </dependency>

    <!-- https://mvnrepository.com/artifact/org.elasticsearch.client/elasticsearch-rest-client -->
  <dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-client</artifactId>
    <version>${elastic.version}</version>
  </dependency>

I'm stuck with this problem and I don't know how to solve it. If I use a JestClient, I'm able to connect to ElasticSearch without any issue.

Have you any suggestion?

You are using a newer version of RestClient that does not have the method performRequest(String, Header) . If you look at the latest source code , you can see that the method takes a Request now, whereas in older versions there were methods that took Strings and Headers . These methods were deprecated and then removed from the code on September 1, 2018 .

Either change your code to use the newer Elastic Search library, or specify an older version of the library (it needs to be before 7.0.x , eg 6.8.4 ) that is compatible with your code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM