简体   繁体   中英

Is python grpc client support retry?

I'using grpc in Python, and I found that the communication between two node accidentally encountering StatusCode.UNAVAILABLE.

I found a solution which said UNAVAILABLE is a retry-able error, we should retry: https://github.com/grpc/grpc/issues/16515 .

So I looked up the documentation and found this: https://github.com/grpc/proposal/blob/master/A6-client-retries.md . This documentation shows a config demo as shown bellow.

"retryPolicy": {
  "maxAttempts": 4,
  "initialBackoff": "0.1s",
  "maxBackoff": "1s",
  "backoffMultiplier": 2,
  "retryableStatusCodes": [
    "UNAVAILABLE"
  ]
}

I tried following the two examples in this question, but it still doesn't work: Use retryPolicy with python GRPC client

Here is my code, there is another problem here, too. I don't quite understand the meaning of ".":

json_config = json.dumps(
                {
                    "methodConfig": [
                        {
                            # "name": [{"service": "<package>.<service>"}],
                            "retryPolicy": {
                                "maxAttempts": 5,
                                "initialBackoff": "0.1s",
                                "maxBackoff": "10s",
                                "backoffMultiplier": 2,
                                "retryableStatusCodes": ["UNAVAILABLE"],
                            },
                        }
                    ]
                }
            )

            options = [
                ('grpc.service_config', json_config)
            ]
            taf_grpc_client = GrpcRpcClient(RpcConfig(taf_server_host, self._taf_server_port, options=options),
                                            taf_server_proto_pb2_grpc.TafServerStub)
            self._taf_grpc_client_dict[taf_server_host] = taf_grpc_client

What I want to know is whether or not Python GRPC supports "retry", and what's the proper usage of it.

The service config you specified is correct. Since there isn't a reproduction case, I applied your code to our HelloWorld example:

async def run() -> None:
    json_config = json.dumps({
        "methodConfig": [{
            "name": [{
                "service": "helloworld.Greeter"
            }],
            "retryPolicy": {
                "maxAttempts": 5,
                "initialBackoff": "0.1s",
                "maxBackoff": "10s",
                "backoffMultiplier": 2,
                "retryableStatusCodes": ["UNAVAILABLE"],
            },
        }]
    })
    async with grpc.aio.insecure_channel('localhost:50051',
                                         options=(('grpc.service_config',
                                                   json_config),)) as channel:
        stub = helloworld_pb2_grpc.GreeterStub(channel)
        response = await stub.SayHello(helloworld_pb2.HelloRequest(name='you'))
    print("Greeter client received: " + response.message)

If you ran it with env GRPC_VERBOSITY=debug , you should observe multiple attempt to retry to connect. If there are other issues, please file an issue to https://github.com/grpc/grpc/issues .

Tested with the code below, when timeout is enabled, the retry policy doesn't take effect. Especially, the logs don't explicitly distinguish between retries for channel connection and those for grpc call, ie helloworld_pb2.HelloRequest, and thus it's difficult to tell that retry for grpc call really happens. Please correct me if there is any misunderstanding here.

Package version:

  • python 3.9.4
  • grpcio 1.40.0
    json_config = json.dumps({
        "methodConfig": [{
            "name": [{
                "service": "helloworld.Greeter"
            }],
            "retryPolicy": {
                "maxAttempts": 5,
                "initialBackoff": "0.1s",
                "maxBackoff": "2s",
                "backoffMultiplier": 2,
                "retryableStatusCodes": ["UNAVAILABLE", "DEADLINE_EXCEEDED"],
            },
        }]
    })
    async with grpc.aio.insecure_channel('localhost:50051',
                                         options=(('grpc.service_config',
                                                   json_config),)) as channel:
        stub = helloworld_pb2_grpc.GreeterStub(channel)
        try:
            response = await stub.SayHello(helloworld_pb2.HelloRequest(name='you'), timeout=3)
            print("Greeter client received: " + response.message)
        except Exception as e:
            print(e)
        time.sleep(60)

Logs:

D0828 22:22:08.717000000 19852 src/core/ext/filters/client_channel/lb_policy_registry.cc:42] registering LB policy factory for "grpclb"
D0828 22:22:08.720000000 19852 src/core/ext/filters/client_channel/lb_policy_registry.cc:42] registering LB policy factory for "priority_experimental"
D0828 22:22:08.723000000 19852 src/core/ext/filters/client_channel/lb_policy_registry.cc:42] registering LB policy factory for "weighted_target_experimenta
l"
D0828 22:22:08.726000000 19852 src/core/ext/filters/client_channel/lb_policy_registry.cc:42] registering LB policy factory for "pick_first"
D0828 22:22:08.731000000 19852 src/core/ext/filters/client_channel/lb_policy_registry.cc:42] registering LB policy factory for "round_robin"
D0828 22:22:08.733000000 19852 src/core/ext/filters/client_channel/lb_policy_registry.cc:42] registering LB policy factory for "ring_hash_experimental"
D0828 22:22:08.735000000 19852 src/core/ext/filters/client_channel/resolver/dns/native/dns_resolver.cc:320] Using native dns resolver
D0828 22:22:08.738000000 19852 src/core/ext/xds/certificate_provider_registry.cc:33] registering certificate provider factory for "file_watcher"
D0828 22:22:08.740000000 19852 src/core/ext/filters/client_channel/lb_policy_registry.cc:42] registering LB policy factory for "cds_experimental"
D0828 22:22:08.742000000 19852 src/core/ext/filters/client_channel/lb_policy_registry.cc:42] registering LB policy factory for "xds_cluster_impl_experiment
al"
D0828 22:22:08.745000000 19852 src/core/ext/filters/client_channel/lb_policy_registry.cc:42] registering LB policy factory for "xds_cluster_resolver_experi
mental"
D0828 22:22:08.747000000 19852 src/core/ext/filters/client_channel/lb_policy_registry.cc:42] registering LB policy factory for "xds_cluster_manager_experim
ental"
D0828 22:22:08.753000000 19852 src/core/ext/filters/client_channel/resolver/dns/native/dns_resolver.cc:267] Start resolving.
I0828 22:22:10.806000000 31236 src/core/ext/filters/client_channel/subchannel.cc:1012] Connect failed: {"created":"@1661696530.806000000","description":"OS
 Error","file":"src/core/lib/iomgr/tcp_client_windows.cc","file_line":106,"os_error":"No connection could be made because the target machine actively refus
ed it.\r\n","syscall":"ConnectEx","wsa_error":10061}
<AioRpcError of RPC that terminated with:
        status = StatusCode.DEADLINE_EXCEEDED
        details = "Deadline Exceeded"
        debug_error_string = "{"created":"@1661696531.755000000","description":"Deadline Exceeded","file":"src/core/ext/filters/deadline/deadline_filter.cc
","file_line":81,"grpc_status":4}"
>
I0828 22:22:12.988000000 31236 src/core/ext/filters/client_channel/subchannel.cc:1012] Connect failed: {"created":"@1661696532.987000000","description":"OS
 Error","file":"src/core/lib/iomgr/tcp_client_windows.cc","file_line":106,"os_error":"No connection could be made because the target machine actively refus
ed it.\r\n","syscall":"ConnectEx","wsa_error":10061}
D0828 22:22:13.007000000 31236 src/core/ext/filters/client_channel/resolver/dns/native/dns_resolver.cc:245] In cooldown from last resolution (from 4255 ms
ago). Will resolve again in 25745 ms
I0828 22:22:13.013000000 31236 src/core/ext/filters/client_channel/subchannel.cc:955] Subchannel 000001FC0A0AA500: Retry immediately
I0828 22:22:13.015000000 31236 src/core/ext/filters/client_channel/subchannel.cc:980] Failed to connect to channel, retrying
I0828 22:22:15.053000000 31236 src/core/ext/filters/client_channel/subchannel.cc:1012] Connect failed: {"created":"@1661696535.052000000","description":"OS
 Error","file":"src/core/lib/iomgr/tcp_client_windows.cc","file_line":106,"os_error":"No connection could be made because the target machine actively refus
ed it.\r\n","syscall":"ConnectEx","wsa_error":10061}
I0828 22:22:15.060000000 31236 src/core/ext/filters/client_channel/subchannel.cc:955] Subchannel 000001FC0A1184E0: Retry immediately
I0828 22:22:15.062000000 31236 src/core/ext/filters/client_channel/subchannel.cc:980] Failed to connect to channel, retrying
I0828 22:22:17.091000000 31236 src/core/ext/filters/client_channel/subchannel.cc:1012] Connect failed: {"created":"@1661696537.091000000","description":"OS
 Error","file":"src/core/lib/iomgr/tcp_client_windows.cc","file_line":106,"os_error":"No connection could be made because the target machine actively refus
ed it.\r\n","syscall":"ConnectEx","wsa_error":10061}
I0828 22:22:17.099000000 31236 src/core/ext/filters/client_channel/subchannel.cc:955] Subchannel 000001FC0A0AA500: Retry immediately
I0828 22:22:17.101000000 31236 src/core/ext/filters/client_channel/subchannel.cc:980] Failed to connect to channel, retrying

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM