[英]DataType verification on DataFrame Scala
I need to validate datatypes of DataFrame. 我需要验证DataFrame的数据类型。
Sample DF DF样本
val rawData = Seq((1,"First Rec Col 1" , "First Rec Col 2" ), (1,"Second Rec Col 1" , "Second Rec Col 2")).toDF("Raw_PK" ,"Col1", "Col2")
rawData.show
Result : 结果:
Here is my schema, 这是我的架构
val types = Seq(("Col1", "string"), ("Col2", "double"))
It says Col1 should be a String
type and Col2 should be double
它说Col1应该是
String
类型,而Col2应该是double
精度类型
What I have tried ? 我尝试了什么?
There are couple of ways i tried (traditional way of looping) but want to get rid of that. 我尝试了几种方法(传统的循环方法),但希望摆脱这种情况。 So here is what i did
所以这就是我所做的
val df2 = rawData.select(types.map{case (c, t) => col(c).cast(t)} : _*)
df2.show
It is trying to cast Col2 String
to Double
, It showed null in Col2 它正在尝试将Col2
String
为Double
,在Col2中显示为null
I want to achieve that it should add ANOTHER COLUMN saying its not a valid record to process. 我想实现它应该添加另一个列,说它不是要处理的有效记录。
Any help ? 有什么帮助吗? Thanks in advance.
提前致谢。
You can use the techniques described here: https://gist.github.com/dennyglee/c21f59cf81216c1dc9a38525a0e41de1 您可以使用此处描述的技术: https : //gist.github.com/dennyglee/c21f59cf81216c1dc9a38525a0e41de1
DataType verification on DataFrame Scala DataFrame Scala上的DataType验证
Using pattern matching: 使用模式匹配:
import org.apache.spark.sql.types.IntegerType
assert(testDF.schema(col1).dataType match {
case IntegerType => true
case _ => false
})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.