简体   繁体   English

spark-cassandra-connector - repartitionByCassandraReplica 返回空 RDD - Java

[英]spark-cassandra-connector - repartitionByCassandraReplica returns empty RDD - Java

So, I have a 16 node cluster where every node has Spark and Cassandra installed while I am using the Spark-Cassandra Connector 3.0.0.因此,我有一个 16 节点集群,其中每个节点都安装了 Spark 和 Cassandra,同时我正在使用 Spark-Cassandra Connector 3.0.0。 I am trying to join a dataset with a cassandra table on the partition key, while also trying to use.repartitionByCassandraReplica.我正在尝试使用分区键上的 cassandra 表加入数据集,同时还尝试使用.repartitionByCassandraReplica。

However it seems I just get an empty rdd with 0 partitions(line 5 below)?然而,我似乎只是得到一个带有 0 个分区的空 rdd(下面的第 5 行)? Any ideas why?任何想法为什么?

Encoder<ExperimentForm> ExpEncoder = Encoders.bean(ExperimentForm.class);
//FYI experimentlist is a List<String>
Dataset<ExperimentForm> dfexplistoriginal = sp.createDataset(experimentlist, Encoders.STRING()).toDF("experimentid").as(ExpEncoder);
JavaRDD<ExperimentForm> predf = CassandraJavaUtil.javaFunctions(dfexplistoriginal.toJavaRDD()).repartitionByCassandraReplica("mdb","experiment",experimentlist.size(),CassandraJavaUtil.someColumns("experimentid"),CassandraJavaUtil.mapToRow(ExperimentForm.class));
System.out.println(predf.collect()); //Here it gives an empty dataset with 0 partitions

Dataset<ExperimentForm> newdfexplist =  sp.createDataset(predf.rdd(), ExpEncoder);
Dataset<Row> readydfexplist = newdfexplist.as(Encoders.STRING()).toDF("experimentid");

Dataset<Row> metlistinitial = sp.read().format("org.apache.spark.sql.cassandra")
                .options(new HashMap<String, String>() {
                    {
                        put("keyspace", "mdb");
                        put("table", "experiment");
                    }
                })
                .load().select(col("experimentid"), col("description"), col("intensity")).join(readydfexplist, "experimentid");

In case needed this is the experiment table in Cassandra:如果需要,这是 Cassandra 中的实验表:

CREATE TABLE experiment(
experimentid varchar,
description text,
rt float,
intensity float,
mz float,
identifier text,
chemical_formula text,
filename text,
PRIMARY KEY ((experimentid),description, rt, intensity, mz, identifier, chemical_formula, filename));

and this is the ExperimentForm class:这是 ExperimentForm class:

public class ExperimentForm {

    private String experimentid;

    public String getExperimentid() {
        return experimentid;
    }
    public void setExperimentid(String experimentid) {
        this.experimentid = experimentid;
    }
}

Let me know if you need any additional information.如果您需要任何其他信息,请告诉我。

The answer is basically the same as here Spark-Cassandra: repartitionByCassandraReplica or converting dataset to JavaRDD and back do not maintain number of partitions?答案与此处基本相同Spark-Cassandra: repartitionByCassandraReplica or converting dataset to JavaRDD and back do not maintain number of partitions?

Just had to do the repartitionByCassandraReplica and JoinWithCassandraTable on RDD and then convert back to dataset.只需在 RDD 上执行 repartitionByCassandraReplica 和 JoinWithCassandraTable,然后转换回数据集。

I am having the same problem?我有同样的问题? Did you manage to solve this?你设法解决了这个问题吗? Anyone?任何人?

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用spark-cassandra-connector时出错:java.lang.NoSuchMethodError - Error in using spark-cassandra-connector: java.lang.NoSuchMethodError RDD不可序列化的Cassandra / Spark连接器java API - RDD not serializable Cassandra/Spark connector java API Spark java.lang.NoClassDefFoundError中的spark-cassandra-connector错误:com / datastax / driver / core / ProtocolOptions $ Compression - Error with spark-cassandra-connector in Spark java.lang.NoClassDefFoundError: com/datastax/driver/core/ProtocolOptions$Compression [spark-cassandra-connector]如何在Spark 2.3.1中将Scala隐式支持的代码转换为Java - [spark-cassandra-connector]How to convert scala implicit supported code to java in spark 2.3.1 NoClassDefFoundError:spark-cassandra-connector中的org / apache / spark / sql / DataFrame - NoClassDefFoundError: org/apache/spark/sql/DataFrame in spark-cassandra-connector 获取异常java.util.NoSuchElementException:找不到键:spark-cassandra-connector中的&#39;text&#39; - Getting Exception java.util.NoSuchElementException: key not found: 'text' in spark-cassandra-connector 为什么 spark-cassandra-connector 有时会为日期列返回不正确的数据? - Why does the spark-cassandra-connector sometimes return incorrect data for date column? java中的火花卡桑德拉连接器的麻烦 - trouble with spark cassandra connector in java Spark Cassandra 连接器与 Java 用于读取 - Spark Cassandra connector with Java for read 带有Spark 2.0的Spark cassandra连接器Java API - Spark cassandra connector Java API with Spark 2.0
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM