[英]Nested ifelse alternative in pandas
Suppose we've got a test
dataset: 假设我们有一个test
数据集:
value group
123 1
120 1
NA 1
130 1
23 2
22 2
24 2
NA 2
Now we want to replace missing values with group
-wise median values. 现在,我们要用逐group
中值替换缺失值。 In R
we can do it using a nested ifelse
call. 在R
我们可以使用嵌套的ifelse
调用来实现。
first.med <- median(test[test$group == 1, ]$value, na.rm = T)
second.med <- median(test[test$group == 2, ]$value, na.rm = T)
test$value <- ifelse(is.na(test$value) & test$group == 1, first.med
ifelse(is.na(test$value) & test$group == 2, second.med,
test$value))
I though about applying the numpy.where
function or the pandas.DataFrame.Set.map
method as showcased here , but both techniques do not support nesting. 我虽然要应用此处显示的numpy.where
函数或pandas.DataFrame.Set.map
方法,但是这两种技术都不支持嵌套。 I can think of a list comprehension to do this, but I wish to know if there is an alternative in the realm of NumPy/pandas. 我可以想到列表理解方法,但是我想知道NumPy / pandas领域是否有其他选择。 Thank you in advance. 先感谢您。
In this case, you can use a groupby
to fill by the group median: 在这种情况下,您可以使用groupby
来填充组中位数:
In [16]: df.groupby('group')['value'].apply(lambda x: x.fillna(x.median()))
Out[16]:
0 123
1 120
2 123
3 130
4 23
5 22
6 24
7 23
dtype: float64
Although in general, both of those methods can be nested just fine. 尽管一般来说,这两种方法都可以嵌套。 Eg, you could do: 例如,您可以这样做:
In [23]: medians = df.groupby('group')['value'].median()
In [24]: np.where(pd.isnull(df['value']),
np.where(df['group'] == 1, medians.loc[1], medians.loc[2]),
df['value'])
Out[24]: array([ 123., 120., 123., 130., 23., 22., 24., 23.])
df = pd.DataFrame({'value' : [123,120,np.nan ,130,23 ,22 ,24 ,np.nan] , 'group' : [1 , 1 ,1 , 1 , 2 , 2 , 2 , 2] })
def replace_with_median(df):
df['value'][pd.isnull(df['value'])] = df['value'].median()
return df
df.groupby('group').apply(replace_with_median)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.