简体   繁体   English

16个节点中的1个关闭时,cassandra中的身份验证失败

[英]Authentication failures in cassandra when 1 of 16 nodes is down

I have a Cassandra cluster running : 我有一个正在运行的Cassandra集群:

Cassandra 2.0.11.83 | 卡桑德拉2.0.11.83 | DSE 4.6.0 | DSE 4.6.0 | CQL spec 3.1.1 | CQL规范3.1.1 | Thrift protocol 19.39.0 节俭协议19.39.0

The cluster has 18 nodes, split among 3 datacenters, 6 in each. 该集群有18个节点,分为3个数据中心,每个6个。 My system_auth keyspace has the following replication defined: 我的system_auth密钥空间定义了以下复制:

replication = { 'class': 'NetworkTopologyStrategy', 'DC1': '4', 'DC2': '4', 'DC3': '4'} 复制= {'class':'NetworkTopologyStrategy','DC1':'4','DC2':'4','DC3':'4'}

and my authenticator/authorizer are set to: 而我的验证者/授权者设置为:

authenticator: org.apache.cassandra.auth.PasswordAuthenticator 验证者:org.apache.cassandra.auth.PasswordAuthenticator

authorizer: org.apache.cassandra.auth.CassandraAuthorizer 授权者:org.apache.cassandra.auth.CassandraAuthorizer

This morning I brought down one of the nodes in DC1 for maintenance. 今天早上,我关闭了DC1中的一个节点进行维护。 Within a few seconds/minute client applications started logging exceptions like this: 在几秒钟/分钟之内,客户端应用程序开始记录如下异常:

"User my_application_user has no MODIFY permission on or any of its parents" “用户my_application_user对其任何父母或任何父母均没有修改权限”

Running 'LIST ALL PERMISSIONS of my_application_user' on one of the other nodes shows that user to have SELECT and MODIFY on the keyspace xxxxx, so I am rather confused. 在其他节点之一上运行“列出my_application_user的所有权限”表明该用户在键空间xxxxx上具有SELECT和MODIFY,所以我很困惑。 Do I have a setup issue? 我有安装问题吗? Is this a bug of some sort? 这是某种错误吗?

Re-posting this as the answer, as BrianC suggested above. 如BrianC上文所述,将其重新发布为答案。

So this is resolved... Here's the sequence of events that seems to have fixed it: 这样就解决了...这似乎是已修复问题的顺序:

  1. Add 18 more nodes 再添加18个节点
  2. Run cleanup on original nodes (this was part of the original plan) 在原始节点上运行清理(这是原始计划的一部分)
  3. Run a scrub on 1 table, since it was throwing exceptions on cleanup 在1个表上运行清理,因为它在清理时引发异常
  4. Run a repair on the system_auth KS on the original troubled node 在原始故障节点上的system_auth KS上运行修复
  5. Wait for repair service to complete a full pass on all keyspaces 等待维修服务以完成对所有键空间的完整通过
  6. Decom original 18 nodes. Decom原始18个节点。

Honestly, I don't know what fixed it. 老实说,我不知道是什么解决了。 The system_auth repair makes most sense, but what doesn't make sense is that it had run many passes before, so why work now, I don't know. system_auth修复最有意义,但是没有意义的是它之前已经运行了许多遍,所以我现在不知道为什么现在可以工作。 I hope this at least helps someone. 我希望这至少可以帮助某人。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM