简体   繁体   中英

DSE cassandra and spark map collections type: how to perform get operation

For example I have the following table named "example":

  name      |       age       |       address

  'abc'     |       12        | {'street':'1', 'city':'kl', 'country':'malaysia'}
  'cab'     |       15        | {'street':'5', 'city':'jakarta', 'country':'indonesia'}

In Spark I can do this:

scala> val test = sc.cassandraTable ("test","example")

and this:

scala> test.first.getString

and this:

scala> test.first.getMapString, String

which gives me all the fields of the address in the form of a map

Question 1 : But how do I use the "get" to access "city" information? Question 2 : Is there a way to falatten the entire table? Question 3 : how do I go about counting number of rows where "city" = "kl"?

Thanks

Question 3 : How do we count the number of rows where city == something

I'll answer 3 first because this may provide you an easier way to work with the data. Something like

sc.cassandraTable[(String,Map[String,String],Int)]("test","example")
 .filter( _._2.getOrElse("city","NoCity") == "kl" )
 .count

First, I use the type parameter [(String,Map[String,String],Int)] on my cassandraTable call to transform the rows into tuples. This gives me easy access to the Map without any casting. (The order is just how it appears when I made the table in my test environment you may have to change the ordering)

Second I say I would like to filter based on the _._2 which is shorthand for the second element of the incoming tuple. getOrElse returns the value for the key "city" if the key exists and "NoCity" otherwise. The final equivalency checks what city it is.

Finally, I call count to find out the number of entries in the city.

1 How do we access the map?

So the answer to 2 is that once you have a Map, you can call get("key") or getOrElse("key") or any of the standard Scala operations to get a value out of the map.

2 How to flatten the entire table.

Depending on what you mean by "flatten" this can be a variety of things. For example if you want to return the entire table as an array to the driver (Not recommended since your RDD should be very big in production.) You can call collect

If you want to flatten the elements of your map into a tuple you can always do something like calling toSeq and you will end up with a list of (key,value) tuples. Feel free to ask another question if I haven't answered what you want with "flattening."

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM