简体   繁体   English

如何合并两个数据框,但一个 dataframe 有多个符合条件的单元格?

[英]How to merge two dataframes but one dataframe has more than one cell with the condition?

Assuming we have two dataframes:假设我们有两个数据框:

    df1 = pd.DataFrame(
        [["foo",10], ["bar",20]], 
        columns=["1", "2"], 
        index=["x", "y"]
    )
    df2 = pd.DataFrame(
        [["foo",10,20], ["bar",20,30], ["foo",10,30]],
        columns=["1", "2", "3"],
        index=["x", "y", "z"]
    )

This would give us this:这会给我们这个:

output of df1 & df2 df1 & df2 的 output

If I were to merge these data with a condition on two columns:如果我要将这些数据与两列的条件合并:

df3 = pd.merge(df1, df2, how='left', on=['1','2'])

This would give us this:这会给我们这个:

output of merged df3合并df3的output

If I wanted the mean of the values that match the condition from df2 to be output into df3, how would I go about doing that?如果我希望与 df2 中的条件匹配的值的平均值为 output 到 df3,我将如何 go 这样做? (So instead of having two rows of foo & 10, I would only have one row of foo & 10, with the values for the third column being an average of the two rows that match the condition). (因此,我不会有两行 foo & 10,而是只有一行 foo & 10,第三列的值是符合条件的两行的平均值)。 I provided a picture below for clarity:为了清楚起见,我在下面提供了一张图片:

wanted output of merged df3想要合并 df3 的 output

If you want mean then try:如果你想要意味着然后尝试:

df2 = df2.groupby(['1','2'], as_index=False).mean()
df3 = pd.merge(df1, df2, how='left', on=['1','2'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM