[英]Spark (Scala): How to turn an Array[Row] into either a DataSet[Row] or a DataFrame?
[英]How to turn redis into spark dataset or dataframe?
我正在嘗試使用redis作為spark sql的源代碼,但是仍然堅持如何轉換rdd。 以下是我的代碼:
RDD<Tuple2<String,String>> rdd1 = rc.fromRedisKV("user:*",3,redisConfig);
JavaRDD<Row> userRDD = rdd1.toJavaRDD().map(new Function<Tuple2<String,String>, Row>(){
public Row call(Tuple2<String, String> tuple2) throws Exception {
System.out.println(tuple2._2);
return RowFactory.create(tuple2._2().split(","));
}
});
List<StructField> structFields = new ArrayList<StructField>();
structFields.add(DataTypes.createStructField( "name", DataTypes.StringType, true ));
structFields.add(DataTypes.createStructField( "sex", DataTypes.StringType, false ));
structFields.add(DataTypes.createStructField( "age", DataTypes.IntegerType, false ));
StructType structType = DataTypes.createStructType(structFields);
Dataset ds = spark.createDataFrame(userRDD, structType);
ds.createOrReplaceTempView("user");
ds.printSchema();
String sql = "select name, sex, age from user ";
List<Row> list2 = spark.sql(sql).collectAsList();
我得到以下異常:
Caused by: java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
我不知道接下來該做什么,請幫忙!
我終於找到了原因:我的代碼沒有任何問題,但是我需要將我的應用程序的jar上傳到Spark服務器。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.