简体   繁体   English

for循环中Spark列的数据类型验证-Spark DataFrame

[英]Datatype validation of Spark columns in for loop - Spark DataFrame

I'm trying to validate datatype of DataFrame before entering the loop, wherein I'm trying to do SQL calculation, but datatype validation is not going through and it is not getting inside the loop. 我试图在进入循环之前验证DataFrame的数据类型,在其中尝试进行SQL计算,但是数据类型验证没有通过,也没有进入循环内部。 The operation needs to be performed on only numeric columns. 该操作仅需要在数字列上执行。

How can this be solved? 如何解决呢? Is this the right way to handle datatype validation? 这是处理数据类型验证的正确方法吗?

//get datatype of dataframe fields
val datatypes =  parquetRDD_subset.schema.fields

//check if datatype of column is String and enter the loop for calculations.

for (val_datatype <- datatypes if val_datatype.dataType =="StringType") 
{
    val dfs = x.map(field => spark.sql(s"select * from table"))
    val withSum = dfs.reduce((x, y) => x.union(y)).distinct()
}

You are comparing the dataType to a string which will never be true (for me the compiler complains that they are unrelated). 您正在将dataType与永远不会为真的字符串进行比较(对我而言,编译器抱怨它们不相关)。 dataType is an object which is a subtype of org.apache.spark.sql.types.DataType. dataType是一个对象,它是org.apache.spark.sql.types.DataType的子类型。

Try replacing your for with 尝试用替换您的

for (val_datatype <- datatypes if val_datatype.dataType.isInstanceOf[StringType]) 

In any case, your for loop does nothing but declare the vals, it doesn't do anything with them. 无论如何,您的for循环只会声明val,而不会对其进行任何处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM