简体   繁体   English

如何将 spark DataFrame 行值映射到列?

[英]How to map spark DataFrame row values to columns?

I am trying to map values in rows to columns in another dataframe.我正在尝试将行中的值映射到另一个数据框中的列。

I have the following DataFrame, the values in "id" are known to be unique:我有以下 DataFrame,已知“id”中的值是唯一的:

sqlContext.createDataFrame(Seq(("a", 1),("b",2))).toDF("id","number")


sqlContext.createDataFrame(Seq(("jane",10),("John",12))).toDF("mcid", "age")

And I wish to produce a DataFrame with the schema:我希望使用架构生成一个 DataFrame:

| mcid | age | a | b |

I have no idea what you are try to do, but assuming you have this:我不知道你想做什么,但假设你有这个:

val df1 = sqlContext.createDataFrame(Seq(("a", 1),("b",2))).toDF("id","number")
val df2 = sqlContext.createDataFrame(Seq(("jane",10),("John",12))).toDF("mcid", "age")

This will get you a DataFrame with the schema you are looking for:这将为您提供一个DataFrame您正在寻找的架构的DataFrame

df2.join(df1).groupBy($"mcid", $"age").pivot("id").sum("number")

