繁体   English   中英

根据条件将列值替换为另一个表中的列值?

[英]Replace column value with column value in another table based on condition?

我有一个 dataframe 包含一些缺失值,其中“new_id”是一个空字符串。 我有另一个 dataframe 包含它应该具有的值,但没有原始 dataframe 中的所有列,所以我不能只用索引替换这些行。 我原来的 dataframe 看起来像:

df = pd.DataFrame({
    "id": ["1", "2", "3", "4", "5"],
    "new_id": ["", "", "23", "", "52"],
    "color": ["blue", "red", "green", "yellow", "green"],
    "age": [23, 11, 17, 13, 51],
    "trade": ["", "", "C", "", "B"],
    "color2": ["red", "yellow", "red", "blue", "purple"],
    "fruit": ["", "", "orange", "", "grape"]
})
id   new_id    color    age   trade    color2   fruit
1               blue    23              red     
2               red     11             yellow
3      23       green   17     C        red      orange
4               yellow  13              blue
5      52       green   51     B        purple   grape

我需要的数据表是:

df_map = pd.DataFrame({
    "id": ["1", "2", "4"],
    "new_id": ["", "", ""],
    "trade": ["B", "C", "A"],
    "fruit": ["apple", "orange", "apple"]
})
id   new_id    trade   fruit
1              B       apple
2              C       orange
4              A       apple

所需的 output:

id   new_id    color    age   trade    color2   fruit
1               blue    23     B       red      apple
2               red     11     C       yellow   orange
3      23       green   17     C       red      orange
4               yellow  13     A       blue     apple
5      52       green   51     B       purple   grape

如何组合两个数据框中的信息以获取完整的数据集,并且仅替换“new_id”为空字符串的值。

IIUC,您可以首先通过pd.DataFrame.join或 pd.merge 连接id上的两个数据框,以创建一个临时数据框,其中包含来自两个数据框的fruittrade列。 然后,您可以使用bfill (或combine_first )应用 pandas 合并版本并将合并分配给您的初始数据帧。

有关 pandas 合并的更多详细信息,请参见此处

代码:

import pandas as pd

# Define data frames
df = pd.DataFrame({
    "id": ["1", "2", "3", "4", "5"],
    "new_id": ["", "", "23", "", "52"],
    "color": ["blue", "red", "green", "yellow", "green"],
    "age": [23, 11, 17, 13, 51],
    "trade": ["", "", "C", "", "B"],
    "color2": ["red", "yellow", "red", "blue", "purple"],
    "fruit": ["", "", "orange", "", "grape"]
})

df_map = pd.DataFrame({
    "id": ["1", "2", "4"],
    "new_id": ["", "", ""],
    "trade": ["B", "C", "A"],
    "fruit": ["apple", "orange", "apple"]
})

# Join both data frames by setting index 
df_temp = (
    df.set_index(["id"])[["trade", "fruit"]]
    .join(
        df_map.set_index(["id"])
        .drop(columns=["new_id"])
        .rename(columns=lambda x: "temp_"+x)
        )
    .reset_index(drop=True)
)

# Apply coalesce
df["fruit"], df["trade"] = (
    df_temp[["temp_fruit", "fruit"]].bfill(axis=1).iloc[:, 0], 
    df_temp[["temp_trade", "trade"]].bfill(axis=1).iloc[:, 0]
)

Output:

id  new_id  color   age trade  color2   fruit
0   1       blue    23  B      red      apple
1   2       red     11  C      yellow   orange
2   3   23  green   17  C      red      orange
3   4       yellow  13  A      blue     apple
4   5   52  green   51  B      purple   grape

我找到了一种使用 loc 的非常简单的方法。

#loop over the df with the data that I need
for i in range(len(df_map)):

    #get the id value 
    map_id = df_map.id[i]

    #get the index that corresponds to the id in original dataframe
    ind = df.index[df['id'] == map_id].tolist()[0]

    #replace the values in the columns that correspond to the values in map_df
    df.loc[ind, list(df_map)] = df_map.iloc[i]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM