Pandas 根据其他列标准插入缺失值

Question

I have the current dataframe:我有当前的 dataframe：

Date日期	Source资源	Type类型	Visits访问	Sales销售量
01/01/2020 2020 年 1 月 1 日	Source1来源1	Type1类型1	100 100	10 10
01/01/2020 2020 年 1 月 1 日	Source2来源2	Type1类型1	150 150	5 5
02/01/2020 2020 年 2 月 1 日	Source1来源1	Type1类型1	NaN钠	NaN钠
02/01/2020 2020 年 2 月 1 日	Source2来源2	Type1类型1	125 125	15 15
03/01/2020 2020 年 3 月 1 日	Source1来源1	Type2类型2	150 150	18 18
03/01/2020 2020 年 3 月 1 日	Source2来源2	Type2类型2	NaN钠	NaN钠
04/01/2020 2020 年 4 月 1 日	Source1来源1	Type2类型2	150 150	25 25
04/01/2020 2020 年 4 月 1 日	Source2来源2	Type2类型2	120 120	05 05

What I'd like to do is a simple.interpolate() for the missing data however I need to group it by the Source and Type to keep the data as accurate as possible rather than basing it on the above and below rows which aren't relevant.我想要做的是缺失数据的 simple.interpolate() 但是我需要按 Source 和 Type 对其进行分组以保持数据尽可能准确，而不是基于上面和下面的行相关。

I've got to this stage:我已经到了这个阶段：

df_fixed = df[['Source','Type','Visits','Sales']].loc[(df['Source'] == 'Source1') & (df['Type'] == 'Type1')].interpolate()

Which does the first step but can't get any further and feel like there's an easier way.这是第一步，但不能再进一步，感觉有一种更简单的方法。

What would be the most elegant way to complete this?完成此任务的最优雅方式是什么？

Answer 1

One idea is change the NAN - of median or mean.一个想法是改变NAN - 中位数或平均值。

df['visits'].fillna(df['visits'].median(), inplace=True)
df.fillna(df.mean())

Edit:编辑：

If You decide to use.interpolate() - so如果您决定使用.interpolate() - 所以

need to group it by the Source and Type:需要按 Source 和 Type 对其进行分组：

You can use groupBy() - method for group by Source and Type:您可以使用groupBy() - 按来源和类型分组的方法：

df.groupby(['Source', 'Type'])

Pandas 根据其他列标准插入缺失值

问题描述

1 个解决方案

解决方案1
0 2021-05-08 13:24:55

Pandas 根据其他列标准插入缺失值

问题描述

1 个解决方案

解决方案1 0 2021-05-08 13:24:55

解决方案1
0 2021-05-08 13:24:55