简体   繁体   English

检索行列值作为其Scala类型而不是列

[英]Retrieving Row column values as their Scala type and not Column

What I'm trying to achieve is inferring values to certain DataFrame columns taking into account values of each individual row. 我想要实现的是在考虑到每一行的值的情况下,将值推断为某些DataFrame列。

.withColumn("date", when(col("date").isNull, lit(new DateTime(col("timestamp").as[Long]).getYear)))

The problem is that I can't wrap my head around how to retrieve, for each of the Row objects, its value for the given column. 问题在于,我无法为每个Row对象检索给定列的值。 I've seen other solutions but they either list the whole set of values for all of the rows, or just get the first value of them, which isn't what I'm trying to achieve. 我看过其他解决方案,但是它们要么列出所有行的整个值集,要么只是获得它们的第一个值,这不是我要实现的目标。

Image an example DF like this... 像这样的示例DF图片...

 (year, val1, val2, val3, timestamp) (null, 10, 12, null, 123456789) (null, 11, 12, null, 234567897) 

And what I want to see after applying individual functions (for example, extracting year from timestamp) to each of the Rows is... 在将单独的功能(例如,从时间戳中提取年份)应用于每个行之后,我想看到的是...

 (year, val1, val2, val3, timestamp) (2018 [using DateTime class], 10, 12, 1012, 123456789) (2018 [using DateTime class], 12, 12, 1212, 234567897) 

Is there any way of doing this? 有什么办法吗?

Thats where UDFs come into play : 那就是UDF发挥作用的地方:

val udf_extractYear = udf((ts:Long) => new DateTime(ts).getYear)

then you can use this using eg 那么你可以使用例如

df
.withColumn("year", when(col("year").isNull, udf_extractYear(col("timestamp"))).otherwise(col("year")))
.show()

As you can see your timestamp column is automatically mapped to Long 如您所见,您的timestamp列自动映射到Long

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM