简体   繁体   中英

Local Pubsub Emulator won't work with Dataflow

I am developing Dataflow in Java, the input comes from a Pubsub. Later, I saw a guide here on how to use local Pubsub emulator so I would not need to deploy to GCP in order to test.

Here is my simple code:

private interface Options extends PipelineOptions, PubsubOptions, StreamingOptions {

    @Description("Pub/Sub topic to read messages from")
    String getTopic();
    void setTopic(String topic);

    @Description("Pub/Sub subscription to read messages from")
    String getSubscription();
    void setSubscription(String subscription);

    @Description("Local file output")
    String getOutput();
    void setOutput(String output);
}

public static void main(String[] args) {

    Options options = PipelineOptionsFactory
            .fromArgs(args)
            .withValidation()
            .as(Options.class);
    options.setStreaming(true);
    options.setPubsubRootUrl("localhost:8085");

    Pipeline pipeline = Pipeline.create(options);
    pipeline
        .apply("IngestFromPubsub", PubsubIO.readStrings().fromTopic(options.getTopic()))
        // other .apply's

    pipeline.run();

}

I was able to follow the guide, including the part where I need to use the example Python code to create topic, subscription, publisher and even publish messages. When I use the Python code to interact with the Pubsub emulator, I notice the message Detected HTTP/2 connection in the command-line where I run the emulator:

Executing: cmd /c C:\...\google-cloud-sdk\platform\pubsub-emulator\bin\cloud-pubsub-emulator.bat --host=localhost --port=8085
[pubsub] This is the Google Pub/Sub fake.
[pubsub] Implementation may be incomplete or differ from the real system.
[pubsub] Apr 10, 2020 3:33:26 PM com.google.cloud.pubsub.testing.v1.Main main
[pubsub] INFO: IAM integration is disabled. IAM policy methods and ACL checks are not supported
[pubsub] Apr 10, 2020 3:33:26 PM io.gapi.emulators.netty.NettyUtil applyJava7LongHostnameWorkaround
[pubsub] INFO: Unable to apply Java 7 long hostname workaround.
[pubsub] Apr 10, 2020 3:33:27 PM com.google.cloud.pubsub.testing.v1.Main main
[pubsub] INFO: Server started, listening on 8085
[pubsub] Apr 10, 2020 3:34:38 PM io.gapi.emulators.grpc.GrpcServer$3 operationComplete
[pubsub] INFO: Adding handler(s) to newly registered Channel.
[pubsub] Apr 10, 2020 3:34:38 PM io.gapi.emulators.netty.HttpVersionRoutingHandler channelRead
[pubsub] INFO: Detected HTTP/2 connection.
[pubsub] Apr 10, 2020 3:34:52 PM io.gapi.emulators.grpc.GrpcServer$3 operationComplete
[pubsub] INFO: Adding handler(s) to newly registered Channel.
[pubsub] Apr 10, 2020 3:34:52 PM io.gapi.emulators.netty.HttpVersionRoutingHandler channelRead
[pubsub] INFO: Detected HTTP/2 connection.

I compiled/run the code in Eclipse using Dataflow Pipeline Run Configuration, but I get a problem.

在此处输入图像描述 在此处输入图像描述 在此处输入图像描述

Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.RuntimeException: Failed to create subscription: 
...
Caused by: java.lang.RuntimeException: Failed to create subscription: 
    at org.apache.beam.sdk.io.gcp.pubsub.PubsubUnboundedSource.createRandomSubscription(PubsubUnboundedSource.java:1427)
...
Caused by: java.lang.IllegalArgumentException: java.net.MalformedURLException: unknown protocol: localhost
...
Caused by: java.net.MalformedURLException: unknown protocol: localhost

When I try to add http in the line options.setPubsubRootUrl("localhost:8085") , I get an infinitely repeated exception:

com.google.api.client.http.HttpRequest execute
WARNING: exception thrown while executing request
java.net.ConnectException: Connection refused: connect
    at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
    at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
    at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
    at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
    at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
    at java.net.PlainSocketImpl.connect(Unknown Source)
    at java.net.SocksSocketImpl.connect(Unknown Source)

It seems to reach the Pubsub emulator but can't connect as the command-line where I run the emulator generates this infinitely also:

[pubsub] Apr 10, 2020 3:49:30 PM io.gapi.emulators.grpc.GrpcServer$3 operationComplete
[pubsub] INFO: Adding handler(s) to newly registered Channel.
[pubsub] Apr 10, 2020 3:49:30 PM io.gapi.emulators.netty.HttpVersionRoutingHandler channelRead
[pubsub] INFO: Detected non-HTTP/2 connection.

How can I make my Dataflow work with Pubsub emulator?

You are attempting to connect to the Pubsub emulator from the Beam Direct Runner, using the Dataflow fork of the Beam 2.5 SDK. The Dataflow 2.5 SDK and Eclipse plugin were deprecated as of June 6, 2019. However this should work.

You need to prefix your PubsubRootUrl with 'http://' in Beam, as you've discovered. The second problem you are seeing indicates that nothing is listening on localhost:8085 . This is likely because there are actually 2 localhosts: IPv4 and IPv6. The Pubsub Emulator only listens on IPv4 and Windows tries IPv6 first. Try replacing localhost with 127.0.0.1 to force IPv4. You should end up with this:

options.setPubsubRootUrl("http://127.0.0.1:8085")

Besides setting the root URL, you also need to provide a credential factory. When working with the emulator, you don't need any credentials. You can do it either using the code (by setting the option manually) or just pass command line. The latter keeps your code clean.

Code:

options.setPubsubRootUrl("http://127.0.0.1:8085");
options.setCredentialFactoryClass(NoCredentialsFactory.class);

Command line options:

--pubsubRootUrl=http://127.0.0.1:8085
--credentialFactoryClass=ca.dataedu.dataflow.otlpdemo.NoCredentialsFactory

The NoCredentialsFactory code is something like:

import com.google.auth.Credentials;
import org.apache.beam.sdk.extensions.gcp.auth.CredentialFactory;
import org.apache.beam.sdk.options.PipelineOptions;
import org.checkerframework.checker.nullness.qual.Nullable;

public class NoCredentialsFactory implements CredentialFactory{

    private static final NoCredentialsFactory INSTANCE = new NoCredentialsFactory();

    public static NoCredentialsFactory fromOptions(PipelineOptions options) {
        return INSTANCE;
    }

    @Override
    public @Nullable Credentials getCredential() {
        return null;
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM