简体   繁体   English

如何使用另一列中特定行的值填充新的 spark 数据框列。 需要建议

[英]How to populate a new spark dataframe column with values from specific rows in another column. Need suggestions

My issue is this:我的问题是这样的:

I have a spark dataframe that looks like this
+-----------+---------------+
|         id|           name|
+-----------+---------------+
|          1|         Total:|
|          2|          Male:|
|          3|  Under 5 years|
|          4|   5 to 9 years|
|          5| 10 to 14 years|
|          6|        Female:|
|          7|  Under 5 years|
|          8|   5 to 9 years|
|          9| 10 to 14 years|
+-----------+---------------+

I want to create a new DF with an added column that will look like this:
+-----------+---------------+---------------------+
|         id|           name|             new_name|
+-----------+---------------+---------------------+
|          1|         Total:|               Total:|
|          2|          Male:|                Male:|
|          3|  Under 5 years|  Male: Under 5 years|
|          4|   5 to 9 years|  Male: Under 5 years|
|          5| 10 to 14 years|  Male: Under 5 years|
|          6|        Female:|              Female:|
|          7|  Under 5 years|Female: Under 5 years|
|          8|   5 to 9 years|Female: Under 5 years|
|          9| 10 to 14 years|Female: Under 5 years|
+-----------+---------------+---------------------+

I don't have any code worth showing I'm looking for ways to approach the problem.我没有任何值得展示的代码我正在寻找解决问题的方法。 I assume it would be something like:我认为它会是这样的:

val dfB = dfA.withColum(row => aUDF(row))

I'm assuming the solution will need some kind of UDF.我假设解决方案需要某种 UDF。 I assume it needs to loop or map and update a "prefix" val any time it finds a row with ":" in the name field.我假设它需要循环或映射并在任何时候在名称字段中找到带有“:”的行时更新“前缀”val。 But I don't know how to go about doing that.但我不知道该怎么做。 Any ideas would be much appreciated.任何想法将不胜感激。

Spark 2.4.3 you can achieve this by using split and last window function. Spark 2.4.3 你可以通过使用 split 和 last window 函数来实现这一点。

scala> import org.apache.spark.sql.expressions.Window
scala> var df = spark.createDataFrame(Seq((1,"Total:"), (2,"Male:"),(3,  "Under 5 years"),(4,"5 to 9 years"),(5, "10 to 14 years"),(6,"Female:"),(7,"Under 5 years"),(8,"5 to 9 years"),(9, "10 to 14 years"))).toDF("id","name")

scala> df.show

+---+--------------+
| id|          name|
+---+--------------+
|  1|        Total:|
|  2|         Male:|
|  3| Under 5 years|
|  4|  5 to 9 years|
|  5|10 to 14 years|
|  6|       Female:|
|  7| Under 5 years|
|  8|  5 to 9 years|
|  9|10 to 14 years|
+---+--------------+
scala>  var win =Window.orderBy(col("id"))    

scala> var df2 =df.withColumn("name_1",last(when(split($"name",":")(1) ==="",$"name"),true).over(win))

scala> df2.withColumn("name",when($"name"===$"name_1",$"name").otherwise(concat($"name_1",$"name"))).drop($"name_1").show(false)
+---+---------------------+
|id |name                 |
+---+---------------------+
|1  |Total:               |
|2  |Male:                |
|3  |Male:Under 5 years   |
|4  |Male:5 to 9 years    |
|5  |Male:10 to 14 years  |
|6  |Female:              |
|7  |Female:Under 5 years |
|8  |Female:5 to 9 years  |
|9  |Female:10 to 14 years|
+---+---------------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据另一列的值填充 Spark DataFrame 列? - How to populate a Spark DataFrame column based on another column's value? 如何在 Spark 数据框中将值从一列交换到另一列 - How to swap the values from one column to another in Spark dataframe 使用来自另一列的先前值的差异在 Spark DataFrame 中创建新列 - Create new column in Spark DataFrame with diff of previous values from another column Spark - 基于另一个数据帧中一列的值查询数据帧 - Spark - query dataframe based on values from a column in another dataframe 如何将 Spark dataframe 列与另一个 dataframe 列值进行比较 - How to compare Spark dataframe columns with another dataframe column values 如何在数据框中添加新列并填充列? - How add a new column to in dataframe and populate the column? 从哈希映射创建数据框,其中键作为列名称,值作为行在Spark中 - Create a dataframe from a hashmap with keys as column names and values as rows in Spark 如何使用Scala / Spark 2.2将列添加到现有DataFrame并使用window函数在新列中添加特定行 - How to add a column to the existing DataFrame and using window function to add specific rows in the new column using Scala/Spark 2.2 根据列中的值复制Spark数据框中的行 - Replicating rows in Spark dataframe according values in a column 将数组值作为新列添加到 spark 数据框 - Adding array values to a spark dataframe as new column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM