![](/img/trans.png)
[英]Spark Scala: Why am I getting an option error when I specify getOrElse?
[英]I am facing error when I create dataset in Spark
錯誤:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._
case class Drug(S_No: int,Name: string,Drug_Name: string,Gender: string,Drug_Value: int)
scala> val ds=spark.read.csv("file:///home/xxx/drug_detail.csv").as[Drug]
org.apache.spark.sql.AnalysisException: cannot resolve '`S_No`' given input columns: [_c1, _c2, _c3, _c4, _c0];
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$3.applyOrElse(CheckAnalysis.scala:110)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$3.applyOrElse(CheckAnalysis.scala:107)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:278)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:278)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:277)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:275)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:275)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:326)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:324)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:275)
這是我的測試數據:
1,Brandon Buckner,avil,female,525
2,Veda Hopkins,avil,male,633
3,Zia Underwood,paracetamol,male,980
4,Austin Mayer,paracetamol,female,338
5,Mara Higgins,avil,female,153
6,Sybill Crosby,avil,male,193
7,Tyler Rosales,paracetamol,male,778
8,Ivan Hale,avil,female,454
9,Alika Gilmore,paracetamol,female,833
10,Len Burgess,metacin,male,325
用於 :
val ds=spark.read.option("header", "true").csv("file:///home/xxx/drug_detail.csv").as[Drug]
如果您的 csv 文件包含標題,則可能包含 option("header","true")。
例如: spark.read.option("header", "true").csv("...").as[Drug]
使用sql encoders
生成structtype
模式,然后在讀取 csv 文件時傳遞schema
,並將 case 類中的類型定義為Int,String
而不是小寫int,string
。
Example:
Sample data:
cat drug_detail.csv
1,foo,bar,M,2
2,foo1,bar1,F,3
Spark-shell:
case class Drug(S_No: Int,Name: String,Drug_Name: String,Gender: String,Drug_Value: Int)
import org.apache.spark.sql.Encoders
val schema = Encoders.product[Drug].schema
val ds=spark.read.schema(schema).csv("file:///home/xxx/drug_detail.csv").as[Drug]
ds.show()
//+----+----+---------+------+----------+
//|S_No|Name|Drug_Name|Gender|Drug_Value|
//+----+----+---------+------+----------+
//| 1| foo| bar| M| 2|
//| 2|foo1| bar1| F| 3|
//+----+----+---------+------+----------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.