根据条件将列值替换为另一个表中的列值？

Question

我有一个 dataframe 包含一些缺失值，其中“new_id”是一个空字符串。 我有另一个 dataframe 包含它应该具有的值，但没有原始 dataframe 中的所有列，所以我不能只用索引替换这些行。 我原来的 dataframe 看起来像：

df = pd.DataFrame({
    "id": ["1", "2", "3", "4", "5"],
    "new_id": ["", "", "23", "", "52"],
    "color": ["blue", "red", "green", "yellow", "green"],
    "age": [23, 11, 17, 13, 51],
    "trade": ["", "", "C", "", "B"],
    "color2": ["red", "yellow", "red", "blue", "purple"],
    "fruit": ["", "", "orange", "", "grape"]
})

id   new_id    color    age   trade    color2   fruit
1               blue    23              red     
2               red     11             yellow
3      23       green   17     C        red      orange
4               yellow  13              blue
5      52       green   51     B        purple   grape

我需要的数据表是：

df_map = pd.DataFrame({
    "id": ["1", "2", "4"],
    "new_id": ["", "", ""],
    "trade": ["B", "C", "A"],
    "fruit": ["apple", "orange", "apple"]
})

id   new_id    trade   fruit
1              B       apple
2              C       orange
4              A       apple

所需的 output：

id   new_id    color    age   trade    color2   fruit
1               blue    23     B       red      apple
2               red     11     C       yellow   orange
3      23       green   17     C       red      orange
4               yellow  13     A       blue     apple
5      52       green   51     B       purple   grape

如何组合两个数据框中的信息以获取完整的数据集，并且仅替换“new_id”为空字符串的值。

Answer 1

IIUC，您可以首先通过pd.DataFrame.join或 pd.merge 连接id上的两个数据框，以创建一个临时数据框，其中包含来自两个数据框的fruit和trade列。 然后，您可以使用bfill （或combine_first ）应用 pandas 合并版本并将合并分配给您的初始数据帧。

有关 pandas 合并的更多详细信息，请参见此处。

代码：

import pandas as pd

# Define data frames
df = pd.DataFrame({
    "id": ["1", "2", "3", "4", "5"],
    "new_id": ["", "", "23", "", "52"],
    "color": ["blue", "red", "green", "yellow", "green"],
    "age": [23, 11, 17, 13, 51],
    "trade": ["", "", "C", "", "B"],
    "color2": ["red", "yellow", "red", "blue", "purple"],
    "fruit": ["", "", "orange", "", "grape"]
})

df_map = pd.DataFrame({
    "id": ["1", "2", "4"],
    "new_id": ["", "", ""],
    "trade": ["B", "C", "A"],
    "fruit": ["apple", "orange", "apple"]
})

# Join both data frames by setting index 
df_temp = (
    df.set_index(["id"])[["trade", "fruit"]]
    .join(
        df_map.set_index(["id"])
        .drop(columns=["new_id"])
        .rename(columns=lambda x: "temp_"+x)
        )
    .reset_index(drop=True)
)

# Apply coalesce
df["fruit"], df["trade"] = (
    df_temp[["temp_fruit", "fruit"]].bfill(axis=1).iloc[:, 0], 
    df_temp[["temp_trade", "trade"]].bfill(axis=1).iloc[:, 0]
)

Output：

id  new_id  color   age trade  color2   fruit
0   1       blue    23  B      red      apple
1   2       red     11  C      yellow   orange
2   3   23  green   17  C      red      orange
3   4       yellow  13  A      blue     apple
4   5   52  green   51  B      purple   grape

Answer 2

我找到了一种使用 loc 的非常简单的方法。

#loop over the df with the data that I need
for i in range(len(df_map)):

    #get the id value 
    map_id = df_map.id[i]

    #get the index that corresponds to the id in original dataframe
    ind = df.index[df['id'] == map_id].tolist()[0]

    #replace the values in the columns that correspond to the values in map_df
    df.loc[ind, list(df_map)] = df_map.iloc[i]

根据条件将列值替换为另一个表中的列值？

问题描述

2 个解决方案

解决方案1
0 2022-08-01 18:24:46

解决方案2
0 2022-08-02 13:15:42

根据条件将列值替换为另一个表中的列值？

问题描述

2 个解决方案

解决方案1 0 2022-08-01 18:24:46

解决方案2 0 2022-08-02 13:15:42

解决方案1
0 2022-08-01 18:24:46

解决方案2
0 2022-08-02 13:15:42