简体   繁体   中英

High latency in fetching the messages from google pubsub pull model through java client

We have one application which navigates the bigquery logs to pubsub topic, and we have one pull subscription on that topic. I have implemented synchronous pull with lease management , in that I set the maxMessages to 100, which indicates pulling 100 messages at max from the topic in one request.

I had ran a terraform script which will run hundreds of jobs concurrently and that will generate hundreds of logs within in few seconds, My pull mechanism will fetch the messages every 30 seconds. Since there are a lot of logs, I expected 100 messages in one request, but that was not happening, in every request, I am getting arbitrary no of messages like 14, 7, 10, and 3. It takes a lot of time to fetch all the messages about my jobs. I don't know what exactly the issue.

Please help me in identifying how to get the desired messages and decrease my latency when there are a lot of messages on the topic. Is PubSub service throttling the response or any configuration I have to do on my end?

public List<ReceivedMessage> getMessagesFromSubscription(String projectId, String subscriptionId, int numOfMessages,
                                                         CredentialsProvider credentialsProvider) {
 
    List<ReceivedMessage> receivedMessages = new ArrayList<>();
    try {
        SubscriberStubSettings subscriberStubSettings = getSubscriberStubSettings(credentialsProvider);
        try (SubscriberStub subscriber = GrpcSubscriberStub.create(subscriberStubSettings)) {
            String subscriptionName = ProjectSubscriptionName.format(projectId, subscriptionId);
            PullRequest pullRequest = PullRequest.newBuilder()
                    .setMaxMessages(100)
                    .setSubscription(mypubsub)
                    .build();
            PullResponse pullResponse = subscriber.pullCallable().call(pullRequest);
            List<String> ackIds = new ArrayList<>();
            for (ReceivedMessage message : pullResponse.getReceivedMessagesList()) {
                ackIds.add(message.getAckId());
                ModifyAckDeadlineRequest modifyAckDeadlineRequest = ModifyAckDeadlineRequest.newBuilder()
                        .setSubscription(subscriptionName)
                        .addAckIds(message.getAckId())
                        .setAckDeadlineSeconds(30)
                        .build();
                subscriber.modifyAckDeadlineCallable().call(modifyAckDeadlineRequest);
            }
            if (ackIds.isEmpty()) {
             // my logic
            } else {
                AcknowledgeRequest acknowledgeRequest = AcknowledgeRequest.newBuilder()
                        .setSubscription(subscriptionName)
                        .addAllAckIds(ackIds)
                        .build();
                subscriber.acknowledgeCallable().call(acknowledgeRequest);
                receivedMessages = new ArrayList<>(pullResponse.getReceivedMessagesList());
            }
        }
        LOG.info("getMessagesFromSubscription: Received {} Messages for Project Id: {} and" +
                " Subscription Id: {}.", receivedMessages.size(), projectId, subscriptionId);
    } catch (Exception e) {
        LOG.error("getMessagesFromSubscription: Error while pulling message from Pub/Sub " +
                "from Project ID: {} and Subscription ID: {}", projectId, subscriptionId, e);
    }
    return receivedMessages;
}

private SubscriberStubSettings getSubscriberStubSettings(CredentialsProvider credentialsProvider) throws IOException {
    SubscriberStubSettings.Builder subscriberStubSettingsBuilder = SubscriberStubSettings
            .newBuilder()
            .setTransportChannelProvider(SubscriberStubSettings
                    .defaultGrpcTransportProviderBuilder()
                    .setMaxInboundMessageSize(20 << 20)
                    .build());
    if (credentialsProvider != null) {
        subscriberStubSettingsBuilder = subscriberStubSettingsBuilder.setCredentialsProvider(credentialsProvider);
    }
    return subscriberStubSettingsBuilder.build();
}

The documentation calls out specifically the need to have lots of pull requests outstanding simultaneously to achieve high throughput and low latency:

"To achieve low message delivery latency with synchronous pull, it is important to have many simultaneously outstanding pull requests. As the throughput of the topic increases, more pull requests are necessary. In general, asynchronous pull is preferable for latency-sensitive applications."

Making a single, synchronous pull request at a time is not going to be an efficient way to pull messages. The Pub/Sub service tries to trade off full pull responses with latency, preferring to send some messages back quickly rather than a full pull response, which is why you see fewer than 100 messages returned. If you make a lot more pull requests simultaneously, you are more likely to get full responses as the service will recognize that the subscriber can handle more load and fetch more messages to fulfill the pull requests.

If you want to maximize throughput and minimize latency, you should instead use asynchronous streaming pull .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM