[英]Type mismatch error. found Array[String] requires Seq[?] in Scala Spark
Below are the code and the build error i am seeing.下面是我看到的代码和构建错误。 Can you tell how can I resolve this error.你能告诉我如何解决这个错误。 This is the complete code.这是完整的代码。 URL are been omitted.网址被省略。 Spark version used is 1.6.0 Scala version used is 2.10.5使用的 Spark 版本是 1.6.0 使用的 Scala 版本是 2.10.5
Code代码
import java.net.{HttpURLConnection, URL}
import org.slf4j.LoggerFactory
import org.apache.spark.sql.{DataFrame, Row, SQLContext}
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.{SparkConf, SparkContext}
import java.util.Arrays
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
import org.apache.spark.sql.expressions._
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.StructType._
import scala.Predef.exceptionWrapper
object CoExtract {对象 CoExtract {
private val logger = LoggerFactory.getLogger(getClass)
def main(args: Array[String]) {
val sparkConf = new SparkConf()
val sc = new SparkContext(sparkConf)
val sqlcontext = new SQLContext(sc)
// val sqlcontext = new HiveContext(sc)
import sqlcontext.implicits._
val obj = new Connection
val string_response=obj.getCsvResponse(obj.getConnection(new URL("https://")))
val array_response = string_response.split("\n")
logger.info("The length of the array is "+ array_response.length)
val rdd_response=sc.parallelize(array_response.toSeq)
logger.info("The count of elements in the rdd are "+rdd_response.count())
val header = rdd_response.first()
logger.info("header"+header)
val noheaderRDD = rdd_response.filter(_ != header)
logger.info("NoheaderRDD is"+noheaderRDD.first())
val subsetRdd=noheaderRDD.map( x => (Row(
x.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)",-1)(0),
x.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)",-1)(1),
x.split(",")(2)
)
)
)
val x =subsetRdd.zipWithIndex().collect()
val schema = new StructType()
.add(StructField("Email",StringType, true))
.add(StructField("Recipient",StringType, true))
.add(StructField("rowid", LongType, false))
val rdd_to_df = sqlcontext.createDataFrame(subsetRdd,schema)
val df_to_rdd_again = rdd_to_df.rdd.zipWithIndex
rdd_to_df.withColumn("rowid", row_number.over(Window.partitionBy(lit(1)).orderBy(lit(1))))
val final_df = sqlcontext.createDataFrame(df_to_rdd_again.map{case (row,index) => Row.fromSeq(row.toSeq ++ Seq(index))}, schema )
val start: Int = 0
val end: Int = rdd_to_df.count().toInt
var counter:Int = start
logger.info("Final Count" + end)
logger.info("The schema of the dataframe is "+rdd_to_df.printSchema())
final_df.show(100,false)
logger.info("Schema of rdd to df" + rdd_to_df.printSchema())
logger.info("schema of final_df" + final_df.printSchema())
val df_response = sqlcontext.read.format("com.databricks.spark.csv").option("header", "true").load("hdfs:///")
logger.info("The schema of the dataframe is "+df_response) logger.info("The count of the dataframe is "+df_response.count()) logger.info("数据帧的模式为"+df_response) logger.info("数据帧的计数为"+df_response.count())
} } } }
Build Error构建错误
scala:48: error: type mismatch;
[ERROR] found : Array[String]
[ERROR] required: Seq[?]
[ERROR] Error occurred in an application involving default arguments.
[INFO] val rdd_response=sc.parallelize(array_response)
[INFO] ^
只需将Array
转换为Seq
:
val rdd_response=sc.parallelize(array_response.toSeq)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.