簡體   English   中英

如何合並這兩個數據幀以在 Spark Scala 中生成第三個 dataframe?

[英]How would I merge these two dataframes to produce the third dataframe in Spark Scala?

由於無法修改 spark scala 中的特定列值,我很難加入這兩個 dataframe 視圖。 我想我必須以某種方式進行轉置/加入,但無法弄清楚。

這是第一個 dataframe:

  var sample_df = Seq(("john","morning","7am"),("john","night","10pm"),("bob","morning","8am"),("bob","night","11pm"),("phil","morning","9am"),("phil","night","10pm")).toDF("person","time_of_day","wake/sleep hour")

在此處輸入圖像描述

這是第二個 dataframe:

  var sample_df2 = Seq(("john","6am","11pm"),("bob","7am","2am"),("phil","8am","1am")).toDF("person","morning_earliest","night_latest")

在此處輸入圖像描述

這是我要生產的結果 dataframe:

  var resulting_df = Seq(("john","morning","7am","6am"),("john","night","10pm","11pm"),("bob","morning","8am","7am"),("bob","night","11pm","2am"),("phil","morning","9am","8am"),("phil","night","10pm","1am")).toDF("person","time_of_day","wake/sleep hour","earliest/latest")

在此處輸入圖像描述

任何幫助將不勝感激! 謝謝,祝你有美好的一天!

sample_df.createOrReplaceTempView("df1")
sample_df2.createOrReplaceTempView("df2")

spark.sql("""
select person, time_of_day, `wake/sleep hour`, `earliest/latest`
from (
    select person, stack(2, 'morning', morning_earliest, 'night', night_latest) as (time_of_day, `earliest/latest`)
    from df2
) df
join df1
using (time_of_day, person)
""").show()

+------+-----------+---------------+---------------+
|person|time_of_day|wake/sleep hour|earliest/latest|
+------+-----------+---------------+---------------+
|  john|    morning|            7am|            6am|
|  john|      night|           10pm|           11pm|
|   bob|    morning|            8am|            7am|
|   bob|      night|           11pm|            2am|
|  phil|    morning|            9am|            8am|
|  phil|      night|           10pm|            1am|
+------+-----------+---------------+---------------+
val df = sample_df .join(sample_df2,"person") val resulting_df = df.withColumn("earliest/latest", when(col("time_of_day")=== "morning", $"morning_earliest") .otherwise($"night_latest")) .drop($"morning_earliest") .drop($"night_latest") resulting_df.show()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM