[英]Replace column value with column value in another table based on condition?
我有一个 dataframe 包含一些缺失值,其中“new_id”是一个空字符串。 我有另一个 dataframe 包含它应该具有的值,但没有原始 dataframe 中的所有列,所以我不能只用索引替换这些行。 我原来的 dataframe 看起来像:
df = pd.DataFrame({
"id": ["1", "2", "3", "4", "5"],
"new_id": ["", "", "23", "", "52"],
"color": ["blue", "red", "green", "yellow", "green"],
"age": [23, 11, 17, 13, 51],
"trade": ["", "", "C", "", "B"],
"color2": ["red", "yellow", "red", "blue", "purple"],
"fruit": ["", "", "orange", "", "grape"]
})
id new_id color age trade color2 fruit
1 blue 23 red
2 red 11 yellow
3 23 green 17 C red orange
4 yellow 13 blue
5 52 green 51 B purple grape
我需要的数据表是:
df_map = pd.DataFrame({
"id": ["1", "2", "4"],
"new_id": ["", "", ""],
"trade": ["B", "C", "A"],
"fruit": ["apple", "orange", "apple"]
})
id new_id trade fruit
1 B apple
2 C orange
4 A apple
所需的 output:
id new_id color age trade color2 fruit
1 blue 23 B red apple
2 red 11 C yellow orange
3 23 green 17 C red orange
4 yellow 13 A blue apple
5 52 green 51 B purple grape
如何组合两个数据框中的信息以获取完整的数据集,并且仅替换“new_id”为空字符串的值。
IIUC,您可以首先通过pd.DataFrame.join
或 pd.merge 连接id
上的两个数据框,以创建一个临时数据框,其中包含来自两个数据框的fruit
和trade
列。 然后,您可以使用bfill
(或combine_first
)应用 pandas 合并版本并将合并分配给您的初始数据帧。
有关 pandas 合并的更多详细信息,请参见此处。
代码:
import pandas as pd
# Define data frames
df = pd.DataFrame({
"id": ["1", "2", "3", "4", "5"],
"new_id": ["", "", "23", "", "52"],
"color": ["blue", "red", "green", "yellow", "green"],
"age": [23, 11, 17, 13, 51],
"trade": ["", "", "C", "", "B"],
"color2": ["red", "yellow", "red", "blue", "purple"],
"fruit": ["", "", "orange", "", "grape"]
})
df_map = pd.DataFrame({
"id": ["1", "2", "4"],
"new_id": ["", "", ""],
"trade": ["B", "C", "A"],
"fruit": ["apple", "orange", "apple"]
})
# Join both data frames by setting index
df_temp = (
df.set_index(["id"])[["trade", "fruit"]]
.join(
df_map.set_index(["id"])
.drop(columns=["new_id"])
.rename(columns=lambda x: "temp_"+x)
)
.reset_index(drop=True)
)
# Apply coalesce
df["fruit"], df["trade"] = (
df_temp[["temp_fruit", "fruit"]].bfill(axis=1).iloc[:, 0],
df_temp[["temp_trade", "trade"]].bfill(axis=1).iloc[:, 0]
)
Output:
id new_id color age trade color2 fruit
0 1 blue 23 B red apple
1 2 red 11 C yellow orange
2 3 23 green 17 C red orange
3 4 yellow 13 A blue apple
4 5 52 green 51 B purple grape
我找到了一种使用 loc 的非常简单的方法。
#loop over the df with the data that I need
for i in range(len(df_map)):
#get the id value
map_id = df_map.id[i]
#get the index that corresponds to the id in original dataframe
ind = df.index[df['id'] == map_id].tolist()[0]
#replace the values in the columns that correspond to the values in map_df
df.loc[ind, list(df_map)] = df_map.iloc[i]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.