简体   繁体   English

根据另一个数据框中的列填充一个数据框中的空值?

[英]Filling empty values in one dataframe based on column in another dataframe?

I have a first dataframe, which includes some missing values in a column.我有第一个数据框,其中包含列中的一些缺失值。 I then have a second dataframe, which includes a more complete dataset, but not necessarily at the same exact indices.然后我有第二个数据框,它包括一个更完整的数据集,但不一定是完全相同的索引。 As an example, here's a depiction of the situation:举个例子,下面是对这种情况的描述:

在此处输入图片说明

It's clear that filling for the indices that match is easy (eg, the first nan can be filled with 634 from the second dataframe).很明显,填充匹配的索引很容易(例如,第一个 nan 可以用来自第二个数据帧的 634 填充)。 For the indices that aren't in the other DF, I would like to interpolate between the two nearest values (eg, to fill the 5.0, I would like to interpolate between 4.8 and 5.2 in df2).对于不在另一个 DF 中的索引,我想在两个最近的值之间进行插值(例如,为了填充 5.0,我想在 df2 中的 4.8 和 5.2 之间进行插值)。 I'm not sure how to do this, at least not in a pandas way.我不确定如何做到这一点,至少不是以熊猫的方式。 My instinct is to iterate through the missing values, manually find the closest index in df2 and then interpolate between.我的直觉是遍历缺失值,手动找到 df2 中最接近的索引,然后在它们之间进行插值。 I'm sure there's a smarter way of going about this though.我相信有一个更聪明的方法来解决这个问题。 Any tips?有小费吗?

I changed column name Index -> arg to avoid confusion.我更改了列名Index -> arg以避免混淆。

First load data frames首先加载数据帧

df1 = pd.DataFrame({
    'arg': {0: 1.0, 1: 2.3, 2: 2.5, 3: 3.6, 4: 5.0, 5: 5.9, 6: 6.0, 7: 6.2, 8: 6.3, 9: 6.4},
    'value': {0: 634.0, 1: 500.0, 2: 439.0, 3: 287.0, 4: 641.0, 5: 212.0, 6: 374.0, 7: 358.0, 8: 600.0, 9: 755.0}
}) 
df2 = pd.DataFrame({
    'arg': {0: 1.0, 1: 1.4, 2: 1.8, 3: 2.2, 4: 2.4, 5: 2.8, 6: 3.2, 7: 3.6, 8: 4.0, 9: 4.4, 10: 4.8, 11: 5.2, 12: 5.6, 13: 6.0, 14: 6.4},
    'value': {0: 634, 1: 8, 2: 218, 3: 813, 4: 338, 5: 339, 6: 935, 7: 287, 8: 376, 9: 481, 10: 727, 11: 555, 12: 50, 13: 374, 14: 755}
})

Calculate left join on df1 and update values from df1 to df2.计算 df1 上的左连接并将值从 df1 更新到 df2。

temp = df1.merge(df2, on="arg", how="left")
df1["value"] = temp.value_y.combine_first(temp.value_x)

get still NaN values得到仍然 NaN 值

to_interpolate = df1[df1.value.isna()]

add arguments without values to df2 and interpolate their values.将没有值的参数添加到 df2 并插入它们的值。

df3 = pd.concat([to_interpolate, df2]).sort_values("arg")
df3.value.interpolate(inplace=True)

repeat merging.重复合并。

temp = df1.merge(df3, on="arg", how="left")
df1["value"] = temp.value_x.combine_first(temp.value_y)
print(df1)

Outputs:输出:

   arg  value
0  1.0  634.0
1  2.3  500.0
2  2.5  439.0
3  3.6  287.0
4  5.0  641.0
5  5.9  212.0
6  6.0  374.0
7  6.2  358.0
8  6.3  600.0
9  6.4  755.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM