繁体   English   中英

使用Spark上下文将JavaList保存到Cassandra表

[英]Saving JavaList to Cassandra table using spark context

嗨,我是Spark和Scala的新手,在这里我遇到了一些将数据保存到cassandra中的问题,这是我的情况

1)我从Java类到scala类获取用户定义对象的列表(例如,包含firstName,lastName等的User Objects),到目前为止,我可以访问User Object并能够打印其内容

2)现在我想使用spark上下文将usersList保存到cassandra表中,我遍历了很多示例,但是在每一个看到用caseClass和硬编码值创建Seq然后保存到cassandra的地方,我都尝试过并且对我来说很好如下

import scala.collection.JavaConversions._
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

import com.datastax.spark.connector._
import java.util.ArrayList

object SparkCassandra extends App {
    val conf = new SparkConf()
        .setMaster("local[*]")
        .setAppName("SparkCassandra")
        //set Cassandra host address as your local address
        .set("spark.cassandra.connection.host", "127.0.0.1")
    val sc = new SparkContext(conf)
     val usersList = Test.getUsers
     usersList.foreach(x => print(x.getFirstName))
    val collection = sc.parallelize(Seq(userTable("testName1"), userTable("testName1")))
    collection.saveToCassandra("demo", "user", SomeColumns("name"))
    sc.stop()
}

case class userTable(name: String)

但是在这里,我的要求是使用我的usersList中的动态值,而不是使用硬编码的值,或者使用任何其他方式来实现此目的。

如果创建CassandraRow对象的RDD ,则可以直接保存结果,而无需指定列或案例类。 此外, CassandraRow具有极为方便的fromMap函数,因此您可以将行定义为Map对象,将其转换并保存结果。

例:

val myData = sc.parallelize(
  Seq(
    Map("name" -> "spiffman", "address" -> "127.0.0.1"),
    Map("name" -> "Shabarinath", "address" -> "127.0.0.1")
  )
)

val cassandraRowData = myData.map(rowMap => CassandraRow.fromMap(rowMap))

cassandraRowData.saveToCassandra("keyspace", "table")

最终,我得到了针对我的需求进行测试的解决方案,并且可以按如下所示正常工作:

我的Scala代码:

import scala.collection.JavaConversions.asScalaBuffer
import scala.reflect.runtime.universe
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD
import com.datastax.spark.connector.SomeColumns
import com.datastax.spark.connector.toNamedColumnRef
import com.datastax.spark.connector.toRDDFunctions

object JavaListInsert {
  def randomStores(sc: SparkContext, users: List[User]): RDD[(String, String, String)] = {
       sc.parallelize(users).map { x => 
       val fistName = x.getFirstName
       val lastName = x.getLastName
       val city = x.getCity
       (fistName, lastName, city)
    }
  }

  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("cassandraInsert")
    val sc = new SparkContext(conf)
    val usersList = Test.getUsers.toList
    randomStores(sc, usersList).
      saveToCassandra("test", "stores", SomeColumns("first_name", "last_name", "city"))
    sc.stop
  }
}

Java Pojo对象:

    import java.io.Serializable;
    public class User implements Serializable{
        private static final long serialVersionUID = -187292417543564400L;
        private String firstName;
        private String lastName;
        private String city;

        public String getFirstName() {
            return firstName;
        }

        public void setFirstName(String firstName) {
            this.firstName = firstName;
        }

        public String getLastName() {
            return lastName;
        }

        public void setLastName(String lastName) {
            this.lastName = lastName;
        }

        public String getCity() {
            return city;
        }

        public void setCity(String city) {
            this.city = city;
        }
}

返回用户列表的Java类:

import java.util.ArrayList;
import java.util.List;


public class Test {
    public static List<User> getUsers() {
        ArrayList<User> usersList = new ArrayList<User>();
        for(int i=1;i<=100;i++) {
            User user = new User();
            user.setFirstName("firstName_+"+i);
            user.setLastName("lastName_+"+i);
            user.setCity("city_+"+i);
            usersList.add(user);
        }
        return usersList;
    }
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM