简体   繁体   English

使用 lambda 层连接到 AWS Keyspaces 时出错

[英]Errors connecting to AWS Keyspaces using a lambda layer

Intermittently getting the following error when connecting to an AWS keyspace using a lambda layer使用 lambda 层连接到 AWS 密钥空间时间歇性出现以下错误

All host(s) tried for query failed.尝试查询的所有主机均失败。 First host tried, 3.248.244.53:9142: Host considered as DOWN.第一个主机尝试,3.248.244.53:9142:主机被视为 DOWN。 See innerErrors.请参见内部错误。

I am trying to query a table in a keyspace using a nodejs lambda function as follows:我正在尝试使用 nodejs lambda function 查询键空间中的表,如下所示:

import cassandra from 'cassandra-driver';
import fs from 'fs';

export default class AmazonKeyspace {

  tpmsClient = null;

  constructor () {
    let auth = new cassandra.auth.PlainTextAuthProvider('cass-user-at-xxxxxxxxxx', 'zzzzzzzzz');
    let sslOptions1 = {
      ca: [ fs.readFileSync('/opt/utils/AmazonRootCA1.pem', 'utf-8')],
      host: 'cassandra.eu-west-1.amazonaws.com',
      rejectUnauthorized: true
    };
    this.tpmsClient = new cassandra.Client({
      contactPoints: ['cassandra.eu-west-1.amazonaws.com'],
      localDataCenter: 'eu-west-1',
      authProvider: auth,
      sslOptions: sslOptions1,
      keyspace: 'tpms',
      protocolOptions: { port: 9142 }
    });
  }

  getOrganisation = async (orgKey) => {
    const SQL = 'select * FROM organisation where organisation_id=?;';

    return new Promise((resolve, reject) => {
      this.tpmsClient.execute(SQL, [orgKey], {prepare: true}, (err, result) => {
        if (!err?.message) resolve(result.rows);
        else reject(err.message);
      });
    });
  };
}

I am basically following this recommended AWS documentation.我基本上遵循了这个推荐的 AWS 文档。 https://docs.aws.amazon.com/keyspaces/latest/devguide/using_nodejs_driver.html https://docs.aws.amazon.com/keyspaces/latest/devguide/using_nodejs_driver.html

It seems that around 10-20% of the time the lambda function (cassandra driver) cannot connect to the endpoint.似乎大约有 10-20% 的时间 lambda function(cassandra 驱动程序)无法连接到端点。

I am pretty familiar with Cassandra (I already use a 6 node cluster that I manage) and don't have any issues with that.我非常熟悉 Cassandra(我已经使用了我管理的 6 节点集群)并且没有任何问题。

Could this be a timeout or do I need more contact points?这可能是超时还是我需要更多联系点?

Followed the recommended guides.遵循推荐的指南。 Checked from the AWS console for any errors but none shown.从 AWS 控制台检查是否有任何错误,但未显示任何错误。

UPDATE: Update to the above question....更新:更新上述问题....

I am occasionally (1 in 50 if I parallel call the function (5 concurrent calls)) getting the below error:我偶尔(如果我并行调用 function(5 个并发调用),则为 50 分之一)出现以下错误:

"All host(s) tried for query failed. First host tried, 3.248.244.5:9142: DriverError: Socket was closed at Connection.clearAndInvokePending (/opt/node_modules/cassandra-driver/lib/connection.js:265:15) at Connection.close (/opt/node_modules/cassandra-driver/lib/connection.js:618:8) at TLSSocket. (/opt/node_modules/cassandra-driver/lib/connection.js:93:10) at TLSSocket.emit (node:events:525:35)\n at node.net:313:12\n at TCP.done (node:_tls_wrap:587:7) { info: 'Cassandra Driver Error', isSocketError: true, coordinator: '3.248.244.5:9142'} “所有主机尝试查询失败。第一个主机尝试,3.248.244.5:9142:DriverError:套接字在 Connection.clearAndInvokePending 处关闭(/opt/node_modules/cassandra-driver/lib/connection.js:265:15)在 Connection.close (/opt/node_modules/cassandra-driver/lib/connection.js:618:8) 在 TLSSocket。(/opt/node_modules/cassandra-driver/lib/connection.js:93:10) 在 TLSSocket。 emit (node:events:525:35)\n at node.net:313:12\n at TCP.done (node:_tls_wrap:587:7) { info: 'Cassandra Driver Error', isSocketError: true, coordinator: '3.248.244.5:9142'}

This exception may be caused by throttling in the keyspaces side, resulting the Driver Error that you are seeing sporadically.此异常可能是由键空间端的节流引起的,导致您偶尔看到的Driver Error

I would suggest taking a look over this repo which should help you to put measures in place to either prevent the occurrence of this issue or at least reveal the true cause of the exception.我建议您看一下这个回购协议,它可以帮助您采取措施来防止此问题的发生或至少揭示异常的真正原因。

Some of the errors you see in the logs you will need to investigate Amazon CloudWatch metrics to see if you have throttling or system errors.您在日志中看到的一些错误将需要调查 Amazon CloudWatch 指标以查看您是否存在限制或系统错误。 I've built this AWS CloudFormation template to deploy a CloudWatch dashboard with all the appropriate metrics.我构建了这个 AWS CloudFormation 模板来部署具有所有适当指标的 CloudWatch 控制面板。 This will provide better observability for your application.这将为您的应用程序提供更好的可观察性。

A System Error indicates an event that must be resolved by AWS and often part of normal operations.系统错误表示必须由 AWS 解决并且通常是正常操作的一部分的事件。 Activities such as timeouts, server faults, or scaling activity could result in server errors.超时、服务器故障或缩放活动等活动可能会导致服务器错误。 A User error indicates an event that can often be resolved by the user such as invalid query or exceeding a capacity quota.用户错误表示通常可以由用户解决的事件,例如无效查询或超出容量配额。 Amazon Keyspaces passes the System Error back as a Cassandra ServerError . Amazon Keyspaces 将系统错误作为 Cassandra ServerError传回。 In most cases this a transient error, in which case you can retry your request until it succeeds.在大多数情况下,这是一个暂时性错误,在这种情况下,您可以重试您的请求,直到成功为止。 Using the Cassandra driver's default retry policy customers can also experience NoHostAvailableException or AllNodesFailedException or messages like yours "All host(s) tried for query failed".使用 Cassandra 驱动程序的默认重试策略,客户还可能遇到NoHostAvailableExceptionAllNodesFailedException或类似您的“所有主机尝试查询失败”的消息。 This is a client side exception that is thrown once all host in the load balancing policy's query plan have attempted the request.这是一个客户端异常,一旦负载平衡策略的查询计划中的所有主机都尝试了该请求,就会抛出该异常。

Take a look at this retry policy for NodeJs which should help resolve your "All hosts failed" exception or pass back the original exception.查看 NodeJs 的重试策略,它应该有助于解决您的“所有主机失败”异常或传回原始异常。

The retry policies in the Cassandra drivers are pretty crude and will not be able to do more sophisticated things like circuit breaker patters. Cassandra 驱动程序中的重试策略非常粗糙,无法执行断路器模式等更复杂的操作。 You may want to eventually use a "failfast" retry policy for the driver and handle the exceptions in your application code.您可能希望最终为驱动程序使用“failfast”重试策略并处理应用程序代码中的异常。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM