简体   繁体   中英

How to use Java connector for Cassandra to get data from dependent column families

I am looking into Apache Spark, Cassandra 3.7 and Datastax's Java connector for Cassandra.

This could be completely foolish and naive question for you but I am not getting correct way to handle it from documentation

I have 2 tables

Cassandra Column Family: Seasons

+------------------------+
| Id | Season | isActive |
+------------------------+
| 001 | Summer | 0       |
| 002 | Fall   | 0       |
| 003 | Spring | 1       |
+------------------------+

Cassandra Column Family: Fruits

+---------------------------+
| Season | Fruit Name | Id  |
+---------------------------+
| Summer | Fruit1     | 001 |
| Fall   | Fruit2     | 002 |
| Spring | Fruit3     | 003 |
| Spring | Fruit4     | 004 |
| Summer | Fruit5     | 005 |
+---------------------------+

Assume that this Fruits column family is huge so I do not want to load all the data in Spark.

First, I want to get the active seasons, in the above example it is “Spring” and then get the fruits of that Active season from Fruits table, I am not able to do this using Datastax's Java connector for Cassandra. This could be simple but I think I am missing something and I would like to get another view on this from you.

Till now I did the following

JavaRDD<SeasonsClass> seasons RDD = CassandraJavaUtil.javaFunctions(sc)
            .cassandraTable(“myKeySpaceName”, "Seasons")
            .map(SeasonsClass.getSeasonsRows())
            .filter(SeasonsClass.filterActiveSeasons());

JavaRDD<FruitsClass> fruitsRDD = CassandraJavaUtil.javaFunctions(sc)
            .cassandraTable("myKeySpaceName", "Fruits")
            .map(FruitsClass.getFruits());

But this gives me all fruits and not the fruits on active season. How can I get only active season fruits.

I get the list of active seasons but then how can I get the fruits of this active season?

I am using

<dependency>
    <groupId>com.datastax.spark</groupId>
    <artifactId>spark-cassandra-connector_2.10</artifactId>
    <version>1.6.0</version>
</dependency>
<dependency>
    <groupId>com.datastax.spark</groupId>
    <artifactId>spark-cassandra-connector-java_2.10</artifactId>
    <version>1.6.0-M1</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.6.2</version>
</dependency>

Any help would be appreciated.

Thank you in advance

I think this is probably a data modeling issue. In order to query your Fruits table by season you will want to designate the Season column as your partition key, and Fruit Name as your clustering column. I don't think you would need the ID field for this set up, but it depends on what you are using that for.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM