[英]spark data frame operation row and column level useing scala
Original Data frame原始数据框
0.2 0.3 0.2 0.3
+------+------------- -+
| name| country |
+------+---------------+
|Raju |UAS |
|Ram |Pak. |
|null |China |
|null |null |
+------+--------------+
I Need this
+------+--------------+
|Nwet|wet Con |
+------+--------------+
|0.2 | 0.3 |
|0.2 | 0.3 |
|0.0 | 0.3. |
|0.0 | 0.0 |
+------+--------------+
i want to create one Udf .我想创建一个 Udf 。 for Both the column
对于两个列
which will apply to Name Column it check the if it not null then it return 0.2 return 0.0 .这将应用于 Name Column 它检查它是否不为 null 然后它返回 0.2 return 0.0 。 and same Udf apply to country column check if it null return 0.0 .
并且相同的 Udf 适用于 country 列检查它是否为 null 返回 0.0 。 not null then it return 0.3
不为空则返回 0.3
Using StringUtils of apache:使用 apache 的 StringUtils:
val transcodificationName: UserDefinedFunction =
udf { (name: String) => {
if (StringUtils.isBlank(name)) 0.0
else 0.2
}
}
val transcodificationCountry: UserDefinedFunction =
udf { (country: String) => {
if (StringUtils.isBlank(country)) 0.0
else 0.3
}
}
dataframe
.withColumn("Nwet", transcodificationName(col("name"))).cast(DoubleType)
.withColumn("wetCon", transcodificationCountry(col("country"))).cast(DoubleType)
.select("Nwet", "wetcon")
edit:编辑:
val transcodificationColumns: UserDefinedFunction =
udf { (input: String, columnName:String) => {
if (StringUtils.isBlank(country)) 0.0
else if(columnName.equals("name")) 0.2
else if(columnName.equals("country") 0.3
else 0.0
}
}
dataframe
.withColumn("Nwet", transcodificationColumns(col("name"), "name")).cast(DoubleType)
.withColumn("wetCon", transcodificationColumns(col("country")), "country").cast(DoubleType)
.select("Nwet", "wetcon")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.