简体   繁体   中英

Scala - Apply a function to each value in a dataframe column

I have a function that takes a LocalDate (it could take any other type) and returns a DataFrame , eg:

def genDataFrame(refDate: LocalDate): DataFrame = {
  Seq(
    (refDate,refDate.minusDays(7)),
    (refDate.plusDays(3),refDate.plusDays(7))
  ).toDF("col_A","col_B")
}

genDataFrame(LocalDate.parse("2021-07-02")) output:

+----------+----------+
|     col_A|     col_B|
+----------+----------+
|2021-07-02|2021-06-25|
|2021-07-05|2021-07-09|
+----------+----------+

I wanna apply this function to each element in a dataframe column (which contains, obviously, LocalDate values), such as:

val myDate = LocalDate.parse("2021-07-02")

val df = Seq(
  (myDate),
  (myDate.plusDays(1)),
  (myDate.plusDays(3))
).toDF("date")

df :

+----------+
|      date|
+----------+
|2021-07-02|
|2021-07-03|
|2021-07-05|
+----------+

Required output:

+----------+----------+
|     col_A|     col_B|
+----------+----------+
|2021-07-02|2021-06-25|
|2021-07-05|2021-07-09|
|2021-07-03|2021-06-26|
|2021-07-06|2021-07-10|
|2021-07-05|2021-06-28|
|2021-07-08|2021-07-12|
+----------+----------+

How could I achieve that (without using collect )?

You can always convert your data frame to a lazily evaluated view and use Spark SQL:

val df_2 = df.map(x => x.getDate(0).toLocalDate()).withColumnRenamed("value", "col_A")
.withColumn("col_B", col("col_A"))
df_2.createOrReplaceTempView("test")

With that you can create a view like this one:

+----------+----------+
|     col_A|     col_B|
+----------+----------+
|2021-07-02|2021-07-02|
|2021-07-03|2021-07-03|
|2021-07-05|2021-07-05|
+----------+----------+

And then you can use SQL wich I find more intuitive:

spark.sql(s"""SELECT col_A, date_add(col_B, -7) as col_B FROM test
UNION
SELECT date_add(col_A, 3), date_add(col_B, 7) as col_B FROM test""")
.show()

This gives your expected output as a DataFrame:

+----------+----------+
|     col_A|     col_B|
+----------+----------+
|2021-07-02|2021-06-25|
|2021-07-03|2021-06-26|
|2021-07-05|2021-06-28|
|2021-07-05|2021-07-09|
|2021-07-06|2021-07-10|
|2021-07-08|2021-07-12|
+----------+----------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM