[英]Spark Scala Cassandra CSV insert into cassandra
這是下面的代碼:Scala版本:2.11。 Spark版本:2.0.2.6 Cassandra版本:cqlsh 5.0.1 | 卡桑德拉3.11.0.1855 | DSE 5.1.3 | CQL規范3.4.4 | 本機協議v4
我正在嘗試從CSV讀取並寫入Cassandra Table。 我是Scala和Spark的新手。 請糾正我我做錯了的地方
import org.apache.spark.sql.SparkSession
import org.apache.log4j.{Level, Logger}
import com.datastax
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import com.datastax.spark.connector._
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types.{DoubleType, StringType, StructField, StructType}
import org.apache.spark.sql._
import com.datastax.spark.connector.UDTValue
import com.datastax.spark.connector.mapper.DefaultColumnMapper
object dataframeset {
def main(args: Array[String]): Unit = {
// Cassandra Part
val conf = new SparkConf().setAppName("Sample1").setMaster("local[*]")
val sc = new SparkContext(conf)
sc.setLogLevel("ERROR")
val rdd1 = sc.cassandraTable("tdata", "map")
rdd1.collect().foreach(println)
// Scala Read CSV Part
Logger.getLogger("org").setLevel(Level.ERROR)
Logger.getLogger("akka").setLevel(Level.ERROR)
val spark1 = org.apache.spark.sql.SparkSession
.builder()
.master("local")
.appName("Spark SQL basic example")
.getOrCreate()
val df = spark1.read.format("csv")
.option("header","true")
.option("inferschema", "true")
.load("/Users/tom/Desktop/del2.csv")
import spark1.implicits._
df.printSchema()
val dfprev = df.select(col = "Year","Measure").filter("Category = 'Prevention'" )
// dfprev.collect().foreach(println)
val a = dfprev.select("YEAR")
val b = dfprev.select("Measure")
val collection = sc.parallelize(Seq(a,b))
collection.saveToCassandra("tdata", "map", SomeColumns("sno", "name"))
spark1.stop()
}
}
錯誤:
Exception in thread "main" java.lang.IllegalArgumentException: Multiple constructors with the same number of parameters not allowed.
卡桑德拉表
cqlsh:tdata> desc映射
創建表tdata.map(sno int PRIMARY KEY,名稱文本;
我知道我特別想嘗試一次將整個Data frame寫入Cassandra的事情。 不,我也不知道需要做什么。
謝謝湯姆
您可以直接將數據幀(spark 2.x中的dataset [Row])寫入cassandra。
如果在spark conf中啟用了身份驗證,則必須定義cassandra主機,用戶名和密碼,才能使用諸如
val conf = new SparkConf(true)
.set("spark.cassandra.connection.host", "CASSANDRA_HOST")
.set("spark.cassandra.auth.username", "CASSANDRA_USERNAME")
.set("spark.cassandra.auth.password", "CASSANDRA_PASSWORD")
要么
val spark1 = org.apache.spark.sql.SparkSession
.builder()
.master("local")
.config("spark.cassandra.connection.host", "CASSANDRA_HOST")
.config("spark.cassandra.auth.username", "CASSANDRA_USERNAME")
.config("spark.cassandra.auth.password", "CASSANDRA_PASSWORD")
.appName("Spark SQL basic example")
.getOrCreate()
val dfprev = df.filter("Category = 'Prevention'" ).select(col("Year").as("yearAdded"),col("Measure").as("Recording"))
dfprev .write
.format("org.apache.spark.sql.cassandra")
.options(Map("table" -> "map", "keyspace" -> "tdata"))
.save()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.