[英]Saving JavaList to Cassandra table using spark context
嗨,我是Spark和Scala的新手,在這里我遇到了一些將數據保存到cassandra中的問題,這是我的情況
1)我從Java類到scala類獲取用戶定義對象的列表(例如,包含firstName,lastName等的User Objects),到目前為止,我可以訪問User Object並能夠打印其內容
2)現在我想使用spark上下文將usersList保存到cassandra表中,我遍歷了很多示例,但是在每一個看到用caseClass和硬編碼值創建Seq然后保存到cassandra的地方,我都嘗試過並且對我來說很好如下
import scala.collection.JavaConversions._
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import com.datastax.spark.connector._
import java.util.ArrayList
object SparkCassandra extends App {
val conf = new SparkConf()
.setMaster("local[*]")
.setAppName("SparkCassandra")
//set Cassandra host address as your local address
.set("spark.cassandra.connection.host", "127.0.0.1")
val sc = new SparkContext(conf)
val usersList = Test.getUsers
usersList.foreach(x => print(x.getFirstName))
val collection = sc.parallelize(Seq(userTable("testName1"), userTable("testName1")))
collection.saveToCassandra("demo", "user", SomeColumns("name"))
sc.stop()
}
case class userTable(name: String)
但是在這里,我的要求是使用我的usersList中的動態值,而不是使用硬編碼的值,或者使用任何其他方式來實現此目的。
如果創建CassandraRow
對象的RDD
,則可以直接保存結果,而無需指定列或案例類。 此外, CassandraRow
具有極為方便的fromMap
函數,因此您可以將行定義為Map
對象,將其轉換並保存結果。
例:
val myData = sc.parallelize(
Seq(
Map("name" -> "spiffman", "address" -> "127.0.0.1"),
Map("name" -> "Shabarinath", "address" -> "127.0.0.1")
)
)
val cassandraRowData = myData.map(rowMap => CassandraRow.fromMap(rowMap))
cassandraRowData.saveToCassandra("keyspace", "table")
最終,我得到了針對我的需求進行測試的解決方案,並且可以按如下所示正常工作:
我的Scala代碼:
import scala.collection.JavaConversions.asScalaBuffer
import scala.reflect.runtime.universe
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD
import com.datastax.spark.connector.SomeColumns
import com.datastax.spark.connector.toNamedColumnRef
import com.datastax.spark.connector.toRDDFunctions
object JavaListInsert {
def randomStores(sc: SparkContext, users: List[User]): RDD[(String, String, String)] = {
sc.parallelize(users).map { x =>
val fistName = x.getFirstName
val lastName = x.getLastName
val city = x.getCity
(fistName, lastName, city)
}
}
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("cassandraInsert")
val sc = new SparkContext(conf)
val usersList = Test.getUsers.toList
randomStores(sc, usersList).
saveToCassandra("test", "stores", SomeColumns("first_name", "last_name", "city"))
sc.stop
}
}
Java Pojo對象:
import java.io.Serializable;
public class User implements Serializable{
private static final long serialVersionUID = -187292417543564400L;
private String firstName;
private String lastName;
private String city;
public String getFirstName() {
return firstName;
}
public void setFirstName(String firstName) {
this.firstName = firstName;
}
public String getLastName() {
return lastName;
}
public void setLastName(String lastName) {
this.lastName = lastName;
}
public String getCity() {
return city;
}
public void setCity(String city) {
this.city = city;
}
}
返回用戶列表的Java類:
import java.util.ArrayList;
import java.util.List;
public class Test {
public static List<User> getUsers() {
ArrayList<User> usersList = new ArrayList<User>();
for(int i=1;i<=100;i++) {
User user = new User();
user.setFirstName("firstName_+"+i);
user.setLastName("lastName_+"+i);
user.setCity("city_+"+i);
usersList.add(user);
}
return usersList;
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.