简体   繁体   English

从火花数据框中提取列值并将其添加到另一个数据框中

[英]Extract a column value from a spark dataframe and add it to another dataframe

I have a spark dataframe called "df_array" it will always returns a single array as an output like below.我有一个名为“df_array”的 spark 数据帧,它将始终返回一个数组作为输出,如下所示。

arr_value
[M,J,K]

I want to extract it's value and add to another dataframe.我想提取它的值并添加到另一个数据帧。 below is the code I was executing下面是我正在执行的代码

val new_df = old_df.withColumn("new_array_value", df_array.col("UNCP_ORIG_BPR"))

but my code always fails saying "org.apache.spark.sql.AnalysisException: resolved attribute(s)"但我的代码总是失败说“org.apache.spark.sql.AnalysisException:已解决的属性”

Can someone help me on this有人可以帮助我吗

The operation needed here is join这里需要的操作是join

You'll need to have the a common column in both dataframes, which will be used as "key".您需要在两个数据框中都有一个公共列,它将用作“键”。

After the join you can select which columns to be included in the new dataframe.连接后,您可以select要包含在新数据select列。

More detailed can be found here:https://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html更详细的可以在这里找到:https ://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html

join(other, on=None, how=None)加入(其他,on=None,how=None)

Joins with another DataFrame, using the given join expression.
Parameters: 

    other – Right side of the join
    on – a string for the join column name, a list of column names, a join expression (Column), or a list of Columns. If on is a string or a list of strings indicating the name of the join column(s), the column(s) must exist on both sides, and this performs an equi-join.
    how – str, default ‘inner’. One of inner, outer, left_outer, right_outer, leftsemi.

The following performs a full outer join between df1 and df2.

>>> df.join(df2, df.name == df2.name, 'outer').select(df.name, df2.height).collect()
[Row(name=None, height=80), Row(name=u'Bob', height=85), Row(name=u'Alice', height=None)]

If you know the df_array has only one record, you can collect it to driver using first() and then use it as an array of literal values to create a column in any DataFrame:如果您知道df_array只有一条记录,则可以使用first()将其收集到驱动程序,然后将其用作文字值数组以在任何 DataFrame 中创建列:

import org.apache.spark.sql.functions._

// first - collect that single array to driver (assuming array of strings):
val arrValue = df_array.first().getAs[mutable.WrappedArray[String]](0)

// now use lit() function to create a "constant" value column:
val new_df = old_df.withColumn("new_array_value", array(arrValue.map(lit): _*)) 

new_df.show()
// +--------+--------+---------------+
// |old_col1|old_col2|new_array_value|
// +--------+--------+---------------+
// |       1|       a|      [M, J, K]|
// |       2|       b|      [M, J, K]|
// +--------+--------+---------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 提取列值并将其作为Spark数据帧中的数组分配给另一列 - Extract a column value and assign it to another column as an array in Spark dataframe Spark DataFrame添加值列 - Spark DataFrame Add Column with Value 从 Dataframe 列中提取表情符号并将它们添加到同一 Dataframe Scala Spark 的不同列中 - Extract emojis from Dataframe column and add them into a different Column of the same Dataframe Scala Spark 从火花数据框中的字符串列中提取单词 - Extract words from a string column in spark dataframe Spark DataFrame 使用 where 从数组中提取值 - Spark DataFrame extract value from array with where 使用Spark Scala检查一个数据框列中的值是否在另一数据框列中存在 - Check if value from one dataframe column exists in another dataframe column using Spark Scala Spark - 基于另一个数据帧中一列的值查询数据帧 - Spark - query dataframe based on values from a column in another dataframe 如何从 spark 中的另一个 dataframe 值重命名 dataframe 列和数据类型? - How to rename a dataframe column and datatype from another dataframe values in spark? 如何根据火花DataFrame中另一列的值更改一列的值 - How to change the value of a column according to the value of another column in a spark DataFrame 使用 spark sql 中的别名值从现有 Dataframe 创建另一个 dataframe - Create another dataframe from existing Dataframe with alias value in spark sql
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM