簡體   English   中英

使用Spark上下文將JavaList保存到Cassandra表

[英]Saving JavaList to Cassandra table using spark context

嗨,我是Spark和Scala的新手,在這里我遇到了一些將數據保存到cassandra中的問題,這是我的情況

1)我從Java類到scala類獲取用戶定義對象的列表(例如,包含firstName,lastName等的User Objects),到目前為止,我可以訪問User Object並能夠打印其內容

2)現在我想使用spark上下文將usersList保存到cassandra表中,我遍歷了很多示例,但是在每一個看到用caseClass和硬編碼值創建Seq然后保存到cassandra的地方,我都嘗試過並且對我來說很好如下

import scala.collection.JavaConversions._
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

import com.datastax.spark.connector._
import java.util.ArrayList

object SparkCassandra extends App {
    val conf = new SparkConf()
        .setMaster("local[*]")
        .setAppName("SparkCassandra")
        //set Cassandra host address as your local address
        .set("spark.cassandra.connection.host", "127.0.0.1")
    val sc = new SparkContext(conf)
     val usersList = Test.getUsers
     usersList.foreach(x => print(x.getFirstName))
    val collection = sc.parallelize(Seq(userTable("testName1"), userTable("testName1")))
    collection.saveToCassandra("demo", "user", SomeColumns("name"))
    sc.stop()
}

case class userTable(name: String)

但是在這里,我的要求是使用我的usersList中的動態值,而不是使用硬編碼的值,或者使用任何其他方式來實現此目的。

如果創建CassandraRow對象的RDD ,則可以直接保存結果,而無需指定列或案例類。 此外, CassandraRow具有極為方便的fromMap函數,因此您可以將行定義為Map對象,將其轉換並保存結果。

例:

val myData = sc.parallelize(
  Seq(
    Map("name" -> "spiffman", "address" -> "127.0.0.1"),
    Map("name" -> "Shabarinath", "address" -> "127.0.0.1")
  )
)

val cassandraRowData = myData.map(rowMap => CassandraRow.fromMap(rowMap))

cassandraRowData.saveToCassandra("keyspace", "table")

最終,我得到了針對我的需求進行測試的解決方案,並且可以按如下所示正常工作:

我的Scala代碼:

import scala.collection.JavaConversions.asScalaBuffer
import scala.reflect.runtime.universe
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD
import com.datastax.spark.connector.SomeColumns
import com.datastax.spark.connector.toNamedColumnRef
import com.datastax.spark.connector.toRDDFunctions

object JavaListInsert {
  def randomStores(sc: SparkContext, users: List[User]): RDD[(String, String, String)] = {
       sc.parallelize(users).map { x => 
       val fistName = x.getFirstName
       val lastName = x.getLastName
       val city = x.getCity
       (fistName, lastName, city)
    }
  }

  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("cassandraInsert")
    val sc = new SparkContext(conf)
    val usersList = Test.getUsers.toList
    randomStores(sc, usersList).
      saveToCassandra("test", "stores", SomeColumns("first_name", "last_name", "city"))
    sc.stop
  }
}

Java Pojo對象:

    import java.io.Serializable;
    public class User implements Serializable{
        private static final long serialVersionUID = -187292417543564400L;
        private String firstName;
        private String lastName;
        private String city;

        public String getFirstName() {
            return firstName;
        }

        public void setFirstName(String firstName) {
            this.firstName = firstName;
        }

        public String getLastName() {
            return lastName;
        }

        public void setLastName(String lastName) {
            this.lastName = lastName;
        }

        public String getCity() {
            return city;
        }

        public void setCity(String city) {
            this.city = city;
        }
}

返回用戶列表的Java類:

import java.util.ArrayList;
import java.util.List;


public class Test {
    public static List<User> getUsers() {
        ArrayList<User> usersList = new ArrayList<User>();
        for(int i=1;i<=100;i++) {
            User user = new User();
            user.setFirstName("firstName_+"+i);
            user.setLastName("lastName_+"+i);
            user.setCity("city_+"+i);
            usersList.add(user);
        }
        return usersList;
    }
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM