How to use Java connector for Cassandra to get data from dependent column families

Question

I am looking into Apache Spark, Cassandra 3.7 and Datastax's Java connector for Cassandra.

This could be completely foolish and naive question for you but I am not getting correct way to handle it from documentation

I have 2 tables

Cassandra Column Family: Seasons

+------------------------+
| Id | Season | isActive |
+------------------------+
| 001 | Summer | 0       |
| 002 | Fall   | 0       |
| 003 | Spring | 1       |
+------------------------+

Cassandra Column Family: Fruits

+---------------------------+
| Season | Fruit Name | Id  |
+---------------------------+
| Summer | Fruit1     | 001 |
| Fall   | Fruit2     | 002 |
| Spring | Fruit3     | 003 |
| Spring | Fruit4     | 004 |
| Summer | Fruit5     | 005 |
+---------------------------+

Assume that this Fruits column family is huge so I do not want to load all the data in Spark.

First, I want to get the active seasons, in the above example it is “Spring” and then get the fruits of that Active season from Fruits table, I am not able to do this using Datastax's Java connector for Cassandra. This could be simple but I think I am missing something and I would like to get another view on this from you.

Till now I did the following

JavaRDD<SeasonsClass> seasons RDD = CassandraJavaUtil.javaFunctions(sc)
            .cassandraTable(“myKeySpaceName”, "Seasons")
            .map(SeasonsClass.getSeasonsRows())
            .filter(SeasonsClass.filterActiveSeasons());

JavaRDD<FruitsClass> fruitsRDD = CassandraJavaUtil.javaFunctions(sc)
            .cassandraTable("myKeySpaceName", "Fruits")
            .map(FruitsClass.getFruits());

But this gives me all fruits and not the fruits on active season. How can I get only active season fruits.

I get the list of active seasons but then how can I get the fruits of this active season?

I am using

<dependency>
    <groupId>com.datastax.spark</groupId>
    <artifactId>spark-cassandra-connector_2.10</artifactId>
    <version>1.6.0</version>
</dependency>
<dependency>
    <groupId>com.datastax.spark</groupId>
    <artifactId>spark-cassandra-connector-java_2.10</artifactId>
    <version>1.6.0-M1</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.6.2</version>
</dependency>

Any help would be appreciated.

Thank you in advance

Answer 1

I think this is probably a data modeling issue. In order to query your Fruits table by season you will want to designate the Season column as your partition key, and Fruit Name as your clustering column. I don't think you would need the ID field for this set up, but it depends on what you are using that for.

How to use Java connector for Cassandra to get data from dependent column families

Question

1 answers

solution1
0 2016-08-29 18:12:38

How to use Java connector for Cassandra to get data from dependent column families

Question

1 answers

solution1 0 2016-08-29 18:12:38

solution1
0 2016-08-29 18:12:38