检索行列值作为其Scala类型而不是列

Question

What I'm trying to achieve is inferring values to certain DataFrame columns taking into account values of each individual row. 我想要实现的是在考虑到每一行的值的情况下，将值推断为某些DataFrame列。

.withColumn("date", when(col("date").isNull, lit(new DateTime(col("timestamp").as[Long]).getYear)))

The problem is that I can't wrap my head around how to retrieve, for each of the Row objects, its value for the given column. 问题在于，我无法为每个Row对象检索给定列的值。 I've seen other solutions but they either list the whole set of values for all of the rows, or just get the first value of them, which isn't what I'm trying to achieve. 我看过其他解决方案，但是它们要么列出所有行的整个值集，要么只是获得它们的第一个值，这不是我要实现的目标。

Image an example DF like this... 像这样的示例DF图片...

 (year, val1, val2, val3, timestamp) (null, 10, 12, null, 123456789) (null, 11, 12, null, 234567897)

And what I want to see after applying individual functions (for example, extracting year from timestamp) to each of the Rows is... 在将单独的功能（例如，从时间戳中提取年份）应用于每个行之后，我想看到的是...

 (year, val1, val2, val3, timestamp) (2018 [using DateTime class], 10, 12, 1012, 123456789) (2018 [using DateTime class], 12, 12, 1212, 234567897)

Is there any way of doing this? 有什么办法吗？

Answer 1

Thats where UDFs come into play : 那就是UDF发挥作用的地方：

val udf_extractYear = udf((ts:Long) => new DateTime(ts).getYear)

then you can use this using eg 那么你可以使用例如

df
.withColumn("year", when(col("year").isNull, udf_extractYear(col("timestamp"))).otherwise(col("year")))
.show()

As you can see your timestamp column is automatically mapped to Long 如您所见，您的timestamp列自动映射到Long

检索行列值作为其Scala类型而不是列

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-02-08 12:31:06

检索行列值作为其Scala类型而不是列

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-02-08 12:31:06

解决方案1
1 已采纳 2019-02-08 12:31:06