从Spark查询cassandra表而不使用案例类

Question

I am using datastax's connector for connecting to cassandra. 我正在使用datastax的连接器连接到cassandra。

Below is the code that I used, 下面是我使用的代码，

import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import com.datastax.spark.connector._

val sqlContext = new SQLContext(sc)
val conf=new SparkConf(true)
.set("spark.cassandra.connection.host","127.0.0.1")
.set("spark.cassandra.auth.username","the_username")           
.set("spark.cassandra.auth.password","the_password")

val sc=new SparkContext("local","the_keyspace",conf)
val table_1 = sc.cassandraTable("the_keyspace","table_1")
val table_2 = sc.cassandraTable("the_keyspace","table_2")

Now, the way to expose this table as an RDD is by using a case class as a placeholder as below 现在，将此表公开为RDD的方法是使用案例类作为占位符，如下所示

case class Person(name: String, age: Int)
sc.cassandraTable[Person](“test”, “persons”).registerAsTable(“persons”)

This works fine, but I have around 50+ columns in each table and it is a real pain to type them out in a case class and also identifying their types. 这很好用，但是每个表中都有大约50多个列，在案例类中输入它们并标识它们的类型确实很麻烦。

Is there a way to overcome this ? 有办法克服吗？ I am used to getting the csv file as a table using databricks-csv and I can register them as tables and run queries on them without using a case class placeholder, is there something similar for my use case here. 我习惯于使用databricks-csv将csv文件作为表获取，并且可以将它们注册为表并在不使用case类占位符的情况下对它们运行查询，这里的用例是否有类似之处。

If there are none, it would be helpful if there are some generators that I can use to auto-generate these case classes. 如果没有，则可以使用一些生成器来自动生成这些案例类将很有帮助。

Answer 1

You can create data frame directly: 您可以直接创建数据框：

val df = sqlContext
   .read.format("org.apache.spark.sql.cassandra")
   .options(Map("keyspace" -> "test", "table" -> "persons"))
   .load()

从Spark查询cassandra表而不使用案例类

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-08-12 15:12:37

从Spark查询cassandra表而不使用案例类

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-08-12 15:12:37

解决方案1
2 已采纳 2015-08-12 15:12:37