簡體   English   中英

Pyspark - 從長到寬的新列名

[英]Pyspark - from long to wide with new column names

我有這個數據框:

data = [{"name": "test", "sentiment":'positive', "avg":13.65, "stddev":15.24},
{"name": "test", "sentiment":'neutral', "avg":338.74, "stddev":187.27},
{"name": "test", "sentiment":'negative', "avg":54.58, "stddev":50.19}]

df = spark.createDataFrame(data).select("name", "sentiment", "avg", "stddev")
df.show()
      +----+---------+------+------+
      |name|sentiment|   avg|stddev|
      +----+---------+------+------+
      |test| positive| 13.65| 15.24|
      |test|  neutral|338.74|187.27|
      |test| negative| 54.58| 50.19|
      +----+---------+------+------+

我想用這種結構創建一個數據框:

+----+------------+-----------+------------+------------+-----------+------------+
|name|avg_positive|avg_neutral|avg_negative|std_positive|std_neutral|std_negative|
+----+------------+-----------+------------+------------+-----------+------------+
|test|       13.65|     338.74|       54.58|       15.24|     187.27|       50.19|
+----+------------+-----------+------------+------------+-----------+------------+

我也不知道這個操作的名稱,請隨意建議一個合適的標題。 謝謝!

使用groupBy()pivot()

    df_grp = df.groupBy("name").pivot("sentiment").agg((F.first("avg").alias("avg")),(F.first("stddev").alias("stddev")) )
df_grp.show()
    
    
    +----+------------+---------------+-----------+--------------+------------+---------------+
|name|negative_avg|negative_stddev|neutral_avg|neutral_stddev|positive_avg|positive_stddev|
+----+------------+---------------+-----------+--------------+------------+---------------+
|test|       54.58|          50.19|     338.74|        187.27|       13.65|          15.24|
+----+------------+---------------+-----------+--------------+------------+---------------+

如果你真的想重命名列

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM