简体   繁体   English

类型不匹配错误。 在 Scala Spark 中找到 Array[String] 需要 Seq[?]

[英]Type mismatch error. found Array[String] requires Seq[?] in Scala Spark

Below are the code and the build error i am seeing.下面是我看到的代码和构建错误。 Can you tell how can I resolve this error.你能告诉我如何解决这个错误。 This is the complete code.这是完整的代码。 URL are been omitted.网址被省略。 Spark version used is 1.6.0 Scala version used is 2.10.5使用的 Spark 版本是 1.6.0 使用的 Scala 版本是 2.10.5

Code代码

import java.net.{HttpURLConnection, URL}
import org.slf4j.LoggerFactory
import org.apache.spark.sql.{DataFrame, Row, SQLContext}
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.{SparkConf, SparkContext}
import java.util.Arrays
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
import org.apache.spark.sql.expressions._
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.StructType._
import scala.Predef.exceptionWrapper

object CoExtract {对象 CoExtract {

    private val logger = LoggerFactory.getLogger(getClass)
    def main(args: Array[String]) {

    val sparkConf = new SparkConf()
    val sc = new SparkContext(sparkConf)
    val sqlcontext = new SQLContext(sc)
//  val sqlcontext = new HiveContext(sc)

    import sqlcontext.implicits._

    val obj = new Connection
    val string_response=obj.getCsvResponse(obj.getConnection(new URL("https://")))

    val array_response = string_response.split("\n")
    logger.info("The length of the array is "+ array_response.length)
    val rdd_response=sc.parallelize(array_response.toSeq)

    logger.info("The count of elements in the rdd are "+rdd_response.count())

val header = rdd_response.first()
logger.info("header"+header)

val noheaderRDD = rdd_response.filter(_ != header)
logger.info("NoheaderRDD is"+noheaderRDD.first())

val subsetRdd=noheaderRDD.map( x => (Row(
  x.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)",-1)(0),
  x.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)",-1)(1),
  x.split(",")(2)

)
  )
)

val x =subsetRdd.zipWithIndex().collect()

val schema = new StructType()
  .add(StructField("Email",StringType, true))
  .add(StructField("Recipient",StringType, true))
  .add(StructField("rowid", LongType, false))


val rdd_to_df = sqlcontext.createDataFrame(subsetRdd,schema)
val df_to_rdd_again = rdd_to_df.rdd.zipWithIndex

rdd_to_df.withColumn("rowid", row_number.over(Window.partitionBy(lit(1)).orderBy(lit(1))))
val final_df = sqlcontext.createDataFrame(df_to_rdd_again.map{case (row,index) => Row.fromSeq(row.toSeq ++ Seq(index))}, schema )

val start: Int = 0
val end: Int = rdd_to_df.count().toInt
var counter:Int = start



 logger.info("Final Count" + end)
    logger.info("The schema of the dataframe is "+rdd_to_df.printSchema())
final_df.show(100,false)

logger.info("Schema of rdd to df" + rdd_to_df.printSchema())
logger.info("schema of final_df" + final_df.printSchema())

val df_response = sqlcontext.read.format("com.databricks.spark.csv").option("header", "true").load("hdfs:///")

logger.info("The schema of the dataframe is "+df_response) logger.info("The count of the dataframe is "+df_response.count()) logger.info("数据帧的模式为"+df_response) logger.info("数据帧的计数为"+df_response.count())

} } } }

Build Error构建错误

scala:48: error: type mismatch;
[ERROR]  found   : Array[String]
[ERROR]  required: Seq[?]
[ERROR] Error occurred in an application involving default arguments.
[INFO]     val rdd_response=sc.parallelize(array_response)
[INFO]                                     ^

只需将Array转换为Seq

val rdd_response=sc.parallelize(array_response.toSeq)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 scala编译错误:类型不匹配; found:需要IndexedSeq [Int]:scala.collection.immutable.Seq [Int] - scala compile error: type mismatch; found: IndexedSeq[Int] required: scala.collection.immutable.Seq[Int] 火花中的Scala类型MisMatch错误 - Scala Type MisMatch Error in spark 在Spark Scala中将Array [seq [String]]传递给UDF - Pass Array[seq[String]] to UDF in spark scala Scala类型不匹配-找到:所需单位:Array [String] - Scala type mismatch - found: Unit required: Array[String] scala:类型不匹配错误 - 找到T,必需字符串 - scala: type mismatch error - found T, required String Spark - 错误:类型不匹配; 发现:(Int,String)必需:TraversableOnce [?] - Spark --Error :type mismatch; found : (Int, String) required: TraversableOnce[?] 如何修复 scala 中的不匹配错误,其中发现:Seq[scala.collection.immutable.Seq required: scala.collection.Seq? - How can I fix mismatch error in scala where the found : Seq[scala.collection.immutable.Seq required: scala.collection.Seq? Scala:function 组合中的类型不匹配,发现 (Int, Int) => Seq[Int] 需要? => 序列[整数] - Scala: type mismatch in function composition, found (Int, Int) => Seq[Int] require ? => Seq[Int] 类型不匹配 Spark Scala - Type Mismatch Spark Scala 斯卡拉型不匹配; 找到Int,必填字符串 - Scala - type mismatch; found Int, required String
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM