[英]Query cassandra table from spark without using case classes
I am using datastax's connector for connecting to cassandra. 我正在使用datastax的连接器连接到cassandra。
Below is the code that I used, 下面是我使用的代码,
import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import com.datastax.spark.connector._
val sqlContext = new SQLContext(sc)
val conf=new SparkConf(true)
.set("spark.cassandra.connection.host","127.0.0.1")
.set("spark.cassandra.auth.username","the_username")
.set("spark.cassandra.auth.password","the_password")
val sc=new SparkContext("local","the_keyspace",conf)
val table_1 = sc.cassandraTable("the_keyspace","table_1")
val table_2 = sc.cassandraTable("the_keyspace","table_2")
Now, the way to expose this table as an RDD is by using a case class as a placeholder as below 现在,将此表公开为RDD的方法是使用案例类作为占位符,如下所示
case class Person(name: String, age: Int)
sc.cassandraTable[Person](“test”, “persons”).registerAsTable(“persons”)
This works fine, but I have around 50+ columns in each table and it is a real pain to type them out in a case class and also identifying their types. 这很好用,但是每个表中都有大约50多个列,在案例类中输入它们并标识它们的类型确实很麻烦。
Is there a way to overcome this ? 有办法克服吗? I am used to getting the csv file as a table using databricks-csv and I can register them as tables and run queries on them without using a case class placeholder, is there something similar for my use case here.
我习惯于使用databricks-csv将csv文件作为表获取,并且可以将它们注册为表并在不使用case类占位符的情况下对它们运行查询,这里的用例是否有类似之处。
If there are none, it would be helpful if there are some generators that I can use to auto-generate these case classes. 如果没有,则可以使用一些生成器来自动生成这些案例类将很有帮助。
You can create data frame directly: 您可以直接创建数据框:
val df = sqlContext
.read.format("org.apache.spark.sql.cassandra")
.options(Map("keyspace" -> "test", "table" -> "persons"))
.load()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.