简体   繁体   English

懒惰的cassandra加载火花

[英]Lazy cassandra load with spark

I want to know if is a good practice to load a cassandra table in a Lazy mode for then use a where clause. 我想知道在Lazy模式下加载cassandra表是否是一个好习惯,然后使用where子句。

For example: 例如:

Lazy val table = sparkContext.cassandraTable[Type](keyspace,tableName)

---other part of the code--- ---代码的其他部分---

table.where("column = ?",param)

Thanks! 谢谢!

All RDD's are lazy by default. 默认情况下,所有RDD都是惰性的。 They won't actually do anything until you call an action. 在你召集行动之前,他们实际上不会做任何事情。 So don't add lazy as this will just delay the creation of the metadata around your RDD and not actually effect execution. 所以不要添加延迟,因为这只会延迟围绕RDD创建元数据而不会实际影响执行。

Example

val table = sparkContext.cassandraTable[Type](keyspace,tableName)
val tableWithWhere = table.where("x = 5")
val tableTransformed = table.map( x:Type => turnXIntoY(x) )
//nothing has happened in C* or Spark on executors yet
tableTransformed.collect // This causes spark to start doing work

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM