简体   繁体   English

当我尝试使用 Pyspark 从 Amazon Keyspaces 获取数据时,出现不支持的分区程序:com.amazonaws.cassandra.DefaultPartitioner 错误

[英]When I try fetch data from Amazon Keyspaces with Pyspark, I get Unsupported partitioner: com.amazonaws.cassandra.DefaultPartitioner Error

I'm not experienced in Java or Hadoop ecosystem.我对 Java 或 Hadoop 生态系统没有经验。 I configured my Spark cluster to connect to Amazon Keyspaces by using spark-cassandra-connector from Datastax.我使用 Datastax 的 spark-cassandra-connector 配置我的 Spark 集群以连接到 Amazon Keyspaces。 I'm using Pyspark to fetch data from Cassandra. I can successfully connect to Keyspaces/Cassandra cluster.我正在使用 Pyspark 从 Cassandra 获取数据。我可以成功连接到 Keyspaces/Cassandra 集群。 But, when I try to fetch data from it.但是,当我尝试从中获取数据时。

df = spark.sql("SELECT * FROM cass.tutorialkeyspace.tutorialtable")
print ("Table Row Count: ")
print (df.count())

I get this error:我收到此错误:

Unsupported partitioner: com.amazonaws.cassandra.DefaultPartitioner

Yes, keyspace & table exists and has data.是的,keyspace & table 存在并且有数据。 How can I fix/workaround this?我该如何解决/解决这个问题? Thanks!谢谢!

As an FYI, Keyspaces now supports using the RandomPartitioner, which enables reading and writing data in Apache Spark by using the open-source Spark Cassandra Connector.作为 FYI,Keyspaces 现在支持使用 RandomPartitioner,它可以通过使用开源 Spark Cassandra 连接器在 Apache Spark 中读取和写入数据。

Docs: https://docs.aws.amazon.com/keyspaces/latest/devguide/spark-integrating.html文档: https://docs.aws.amazon.com/keyspaces/latest/devguide/spark-integrating.html

Launch announcement: https://aws.amazon.com/about-aws/whats-new/2022/04/amazon-keyspaces-read-write-data-apache-spark/上线公告: https://aws.amazon.com/about-aws/whats-new/2022/04/amazon-keyspaces-read-write-data-apache-spark/

Spark Cassandra Connector is relying on specific partitioner implementation to define data splits, etc. There is no workaround for this problem right now, until somebody adds the implementation of corresponding TokenFactory into this code . Spark Cassandra 连接器依赖于特定的分区器实现来定义数据拆分等。目前没有解决此问题的方法,直到有人将相应的 TokenFactory 的实现添加到此代码中。 It shouldn't be very complex, just should be done by someone who is interested in it.它不应该很复杂,应该由对此感兴趣的人来完成。

Thank you for the feedback.感谢您的反馈。 At this time, You can write to Keyspaces using the Cassandra Spark Connector.此时,您可以使用 Cassandra Spark Connector 写入 Keyspaces。 Reading requires support for token rage.阅读需要令牌愤怒的支持。 Please see the following doc page to see list of supported APIs https://docs.aws.amazon.com/keyspaces/latest/devguide/cassandra-apis.html .请参阅以下文档页面以查看支持的 API 列表https://docs.aws.amazon.com/keyspaces/latest/devguide/cassandra-apis.html

Although we don't have timelines to share at the moment, we prioritize our roadmap based on customer feedback.虽然我们目前没有时间表可以分享,但我们会根据客户反馈确定路线图的优先级。 We are releasing new features all the time.我们一直在发布新功能。 To learn more about our roadmap and upcoming features please contact your AWS Account manager.要详细了解我们的路线图和即将推出的功能,请联系您的 AWS 客户经理。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 当我尝试从 firebase 获取数据时,控制台出现错误 - when I try to get a data from the firebase I get error in the console 将 Cassandra 数据库迁移到 AWS Keyspaces 后如何检查数据完整性 - How do I check the data integrity after migrating a Cassandra database onto AWS Keyspaces 使用 Amazon S3 配置 Pyspark 给出 java.lang.ClassNotFoundException: com.amazonaws.auth.AWSCredentialsProvider - Configuring Pyspark with Amazon S3 giving java.lang.ClassNotFoundException: com.amazonaws.auth.AWSCredentialsProvider 使用 GoCQL 驱动程序在 Amazon Keyspaces 上插入行时出错 - Getting error when inserting a row on Amazon Keyspaces using the GoCQL driver Pyspark S3 错误:java.lang.NoClassDefFoundError:com/amazonaws/services/s3/model/MultiObjectDeleteException - Pyspark S3 error: java.lang.NoClassDefFoundError: com/amazonaws/services/s3/model/MultiObjectDeleteException Pyspark s3 错误:java.lang.NoClassDefFoundError:com/amazonaws/AmazonServiceException - Pyspark s3 error : java.lang.NoClassDefFoundError: com/amazonaws/AmazonServiceException Cassandra 重负载键空间超时错误 - Cassandra timeout error for keyspaces for heavy load 如何重置 AWS Keyspaces 表中现有数据的 TTL? - How do I reset the TTL for existing data in AWS Keyspaces table? PYSPARK 连接到 aws S3 时出错:py4j.protocol.Py4JJavaError:java.lang.NoClassDefFoundError:com/amazonaws/AmazonClientException - PYSPARK Error connecting to aws S3: py4j.protocol.Py4JJavaError: java.lang.NoClassDefFoundError: com/amazonaws/AmazonClientException 嗨,我尝试运行 Lambda function 但在执行测试事件时出现下一个错误 - Hi, I try to run a Lambda function but i get the next error when I execute a test event
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM