简体   繁体   中英

How can I slide over values from specific columns to specific columns within the same dataset?

I have a post-joined datasets where the columns are identical except the right side has new and corrected data and a.TODROP suffix appended at the end of the column's name.

So the dataset looks something like this:

df = spark.createDataFrame(
[
    (1, "Mary", 133, "Pizza", "Mary", 13, "Pizza"),
    (2, "Jimmy", 8, "Hamburger", None, None, None),
    (3, None, None, None, "Carl", 6, "Cake")
],
["guid", "name", "age", "fav_food", "name.TODROP", "age.TODROP", "fav_food.TODROP"]
)

在此处输入图像描述

I'm trying to slide over the right side columns to the left side columns if there is value:

if df['name.TODROP'].isNotNull():
    df['name'] = df['name.TODROP']

if df['age.TODROP'].isNotNull():
    df['age'] = df['age.TODROP']

if df['fav_food.TODROP'].isNotNull():
    df['fav_food'] = df['fav_food.TODROP']

However, the problem is that the brute-force solution will take a lot longer with my real dataset because it has a lot more columns than this example. And I'm also getting this error so it wasn't working out anyway...

"pyspark.sql.utils.AnalysisException: Can't extract value from name#1527: need struct type but got string;"

Another attempt where I try to do it in a loop:

col_list = []
suffix = ".TODROP"
for x in df.columns:
    if x.endswith(suffix) == False:
        col_list.append(x)

for x in col_list:
    df[x] = df[x + suffix]

Same error as above.

Goal:

在此处输入图像描述

Can someone point me in the right direction? Thank you.

First of all, your dot representation of the column name makes confusion for the struct type of column. Be aware that. I have concatenate the column name with backtick and it prevents the misleading column type.

suffix = '.TODROP'
cols1 = list(filter(lambda c: not(c.endswith(suffix)), df.columns))
cols2 = list(filter(lambda c: c.endswith(suffix), df.columns))

for c in cols1[1:]:
    df = df.withColumn(c, f.coalesce(f.col(c), f.col('`' + c + suffix + '`')))

df = df.drop(*cols2)
df.show()

+----+-----+---+---------+
|guid| name|age| fav_food|
+----+-----+---+---------+
|   1| Mary|133|    Pizza|
|   2|Jimmy|  8|Hamburger|
|   3| Carl|  6|     Cake|
+----+-----+---+---------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM