将条件函数应用于 Pandas 中按天分组的数据的有效方法

Question

I want to apply a conditional function to the data grouped every day: For each column that has more than half number of values equal to 0 each day, set all values of the column of that day to np.nan我想对每天分组的数据应用条件函数：对于每天有一半以上值等于 0 的列，将当天列的所有值设置为np.nan

date,value1,value2
2016-01-01 09:00:00,14,14
2016-01-01 10:00:00,12,13
2016-01-01 11:00:00,11,13
2016-01-01 12:00:00,11,9
2016-01-01 13:00:00,17,21
2016-01-01 14:00:00,9,22
2016-01-01 15:00:00,10,9
2016-01-01 16:00:00,11,9
2016-01-01 17:00:00,8,8
2016-01-01 18:00:00,4,2
2016-01-01 19:00:00,5,7
2016-01-01 20:00:00,5,5
2016-01-01 21:00:00,3,4
2016-01-01 22:00:00,2,4
2016-01-01 23:00:00,2,4
2016-01-02 09:00:00,0,0
2016-01-02 10:00:00,0,0
2016-01-02 11:00:00,0,0
2016-01-02 12:00:00,0,0
2016-01-02 13:00:00,1,0
2016-01-02 14:00:00,0,0
2016-01-02 15:00:00,0,0
2016-01-02 16:00:00,0,0
2016-01-02 17:00:00,0,0
2016-01-02 18:00:00,0,0
2016-01-02 19:00:00,0,0
2016-01-02 20:00:00,1,0
2016-01-02 21:00:00,0,0
2016-01-02 22:00:00,0,0
2016-01-02 23:00:00,0,0

Desired output:期望的输出：

date,value1,value2
2016-01-01 09:00:00,14,14
2016-01-01 10:00:00,12,13
2016-01-01 11:00:00,11,13
2016-01-01 12:00:00,11,9
2016-01-01 13:00:00,17,21
2016-01-01 14:00:00,9,22
2016-01-01 15:00:00,10,9
2016-01-01 16:00:00,11,9
2016-01-01 17:00:00,8,8
2016-01-01 18:00:00,4,2
2016-01-01 19:00:00,5,7
2016-01-01 20:00:00,5,5
2016-01-01 21:00:00,3,4
2016-01-01 22:00:00,2,4
2016-01-01 23:00:00,2,4
2016-01-02 09:00:00,null,null
2016-01-02 10:00:00,null,null
2016-01-02 11:00:00,null,null
2016-01-02 12:00:00,null,null
2016-01-02 13:00:00,null,null
2016-01-02 14:00:00,null,null
2016-01-02 15:00:00,null,null
2016-01-02 16:00:00,null,null
2016-01-02 17:00:00,null,null
2016-01-02 18:00:00,null,null
2016-01-02 19:00:00,null,null
2016-01-02 20:00:00,null,null
2016-01-02 21:00:00,null,null
2016-01-02 22:00:00,null,null
2016-01-02 23:00:00,null,null

I have read this question: pandas apply function to data grouped by day and tried to follow:我已经阅读了这个问题： pandas apply function to data grouped by day并尝试遵循：

df_mode = df.groupby(df.index.date).apply(lambda x: mode(x)[0])

I got the most frequent value for each day in each columns.我在每一列中获得了每天最频繁的值。 However I don't know how to process the next step (set all value in the column for that day into np.nan )但是我不知道如何处理下一步（将当天列中的所有值设置为np.nan ）

And is there any more efficient way than using apply in this case?在这种情况下，还有比使用apply更有效的方法吗？

Thank you谢谢

Answer 1

Use GroupBy.transform with compare values by 0 and mean for percentages and then set minssing values by DataFrame.mask :使用GroupBy.transform与0比较值和百分比mean ，然后通过DataFrame.mask设置 minssing 值：

df = df.mask(df.eq(0).groupby(df.index.date).transform('mean').gt(.5))
print (df)
                     value1  value2
date                               
2016-01-01 09:00:00    14.0    14.0
2016-01-01 10:00:00    12.0    13.0
2016-01-01 11:00:00    11.0    13.0
2016-01-01 12:00:00    11.0     9.0
2016-01-01 13:00:00    17.0    21.0
2016-01-01 14:00:00     9.0    22.0
2016-01-01 15:00:00    10.0     9.0
2016-01-01 16:00:00    11.0     9.0
2016-01-01 17:00:00     8.0     8.0
2016-01-01 18:00:00     4.0     2.0
2016-01-01 19:00:00     5.0     7.0
2016-01-01 20:00:00     5.0     5.0
2016-01-01 21:00:00     3.0     4.0
2016-01-01 22:00:00     2.0     4.0
2016-01-01 23:00:00     2.0     4.0
2016-01-02 09:00:00     NaN     NaN
2016-01-02 10:00:00     NaN     NaN
2016-01-02 11:00:00     NaN     NaN
2016-01-02 12:00:00     NaN     NaN
2016-01-02 13:00:00     NaN     NaN
2016-01-02 14:00:00     NaN     NaN
2016-01-02 15:00:00     NaN     NaN
2016-01-02 16:00:00     NaN     NaN
2016-01-02 17:00:00     NaN     NaN
2016-01-02 18:00:00     NaN     NaN
2016-01-02 19:00:00     NaN     NaN
2016-01-02 20:00:00     NaN     NaN
2016-01-02 21:00:00     NaN     NaN
2016-01-02 22:00:00     NaN     NaN
2016-01-02 23:00:00     NaN     NaN

将条件函数应用于 Pandas 中按天分组的数据的有效方法

问题描述

1 个解决方案

解决方案1
4 已采纳 2019-12-18 10:29:01

将条件函数应用于 Pandas 中按天分组的数据的有效方法

问题描述

1 个解决方案

解决方案1 4 已采纳 2019-12-18 10:29:01

解决方案1
4 已采纳 2019-12-18 10:29:01