简体   繁体   中英

How to update the data frame column values from another data frame based a conditional match in polars?

I have two data frames as below.

df_names = pl.DataFrame({'last_name':['Williams','Henry','XYX','Smith','David','Freeman','Walter','Test_A'],
                        'first_name':['Henry','Williams','ABC','David','Smith','Walter','Freeman','Test_B']})

在此处输入图像描述

Another frame as,

df_updates = pl.DataFrame({'last_name':['Williams','Henry','Smith','David','Freeman','Walter'],
                        'first_name':['Henry','Williams','David','Smith','Walter','Freeman'],
                        'last_name_x':['Williams','Williams','Smith','Smith','Freeman','Freeman'],
                        'first_name_x':['Henry','Henry','David','David','Walter','Walter']})

在此处输入图像描述

Here I would like to update the last_name and first_name from df_names Dataframe whichever last and first names matched in dataframe- df_updates with last_name_x and first_name_x values.

here is an expected output as:

在此处输入图像描述

# Join dataframe:
#   - on columns of interest.
#   - left join (keep all rows from first dataframe). 
In [120]: (
     ...:     df_names
     ...:     .join(
     ...:         df_updates,
     ...:         on=["last_name", "first_name"],
     ...:         how="left"
     ...:     )
     ...: )
Out[120]: 
shape: (8, 4)
┌───────────┬────────────┬─────────────┬──────────────┐
│ last_name ┆ first_name ┆ last_name_x ┆ first_name_x │
│ ---       ┆ ---        ┆ ---         ┆ ---          │
│ str       ┆ str        ┆ str         ┆ str          │
╞═══════════╪════════════╪═════════════╪══════════════╡
│ Williams  ┆ Henry      ┆ Williams    ┆ Henry        │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Henry     ┆ Williams   ┆ Williams    ┆ Henry        │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ XYX       ┆ ABC        ┆ null        ┆ null         │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Smith     ┆ David      ┆ Smith       ┆ David        │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ David     ┆ Smith      ┆ Smith       ┆ David        │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Freeman   ┆ Walter     ┆ Freeman     ┆ Walter       │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Walter    ┆ Freeman    ┆ Freeman     ┆ Walter       │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Test_A    ┆ Test_B     ┆ null        ┆ null         │
└───────────┴────────────┴─────────────┴──────────────┘


# Join dataframe:
#   - on columns of interest.
#   - left join (keep all rows from first dataframe). 
# Depending if the last and first name were found the the df_updates table:
#   - take the last and first name form *_ column of df_updates
#   - or take original last and first name from df_names
#   - and only retrain those values (rename columns)
In [121]: (
     ...:     df_names
     ...:     .join(
     ...:         df_updates,
     ...:         on=["last_name", "first_name"],
     ...:         how="left"
     ...:     )
     ...:     .select(
     ...:         [
     ...:             pl.when(
     ...:                 pl.col("last_name_x").is_null()
     ...:             ).then(
     ...:                 pl.col("last_name")
     ...:             ).otherwise(
     ...:                 pl.col("last_name_x")
     ...:             ).alias("last_name"),
     ...:             pl.when(
     ...:                 pl.col("first_name_x").is_null()
     ...:             ).then(
     ...:                 pl.col("first_name")
     ...:             ).otherwise(
     ...:                 pl.col("first_name_x")
     ...:             ).alias("first_name"),
     ...:         ]
     ...:     )
     ...: )
Out[121]: 
shape: (8, 2)
┌───────────┬────────────┐
│ last_name ┆ first_name │
│ ---       ┆ ---        │
│ str       ┆ str        │
╞═══════════╪════════════╡
│ Williams  ┆ Henry      │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Williams  ┆ Henry      │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ XYX       ┆ ABC        │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Smith     ┆ David      │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Smith     ┆ David      │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Freeman   ┆ Walter     │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Freeman   ┆ Walter     │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ Test_A    ┆ Test_B     │
└───────────┴────────────┘

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM