简体   繁体   English

将条件函数应用于 Pandas 中按天分组的数据的有效方法

[英]Efficient way to apply conditional function to data grouped by day in Pandas

I want to apply a conditional function to the data grouped every day: For each column that has more than half number of values equal to 0 each day, set all values of the column of that day to np.nan我想对每天分组的数据应用条件函数:对于每天有一半以上值等于 0 的列,将当天列的所有值设置为np.nan

date,value1,value2
2016-01-01 09:00:00,14,14
2016-01-01 10:00:00,12,13
2016-01-01 11:00:00,11,13
2016-01-01 12:00:00,11,9
2016-01-01 13:00:00,17,21
2016-01-01 14:00:00,9,22
2016-01-01 15:00:00,10,9
2016-01-01 16:00:00,11,9
2016-01-01 17:00:00,8,8
2016-01-01 18:00:00,4,2
2016-01-01 19:00:00,5,7
2016-01-01 20:00:00,5,5
2016-01-01 21:00:00,3,4
2016-01-01 22:00:00,2,4
2016-01-01 23:00:00,2,4
2016-01-02 09:00:00,0,0
2016-01-02 10:00:00,0,0
2016-01-02 11:00:00,0,0
2016-01-02 12:00:00,0,0
2016-01-02 13:00:00,1,0
2016-01-02 14:00:00,0,0
2016-01-02 15:00:00,0,0
2016-01-02 16:00:00,0,0
2016-01-02 17:00:00,0,0
2016-01-02 18:00:00,0,0
2016-01-02 19:00:00,0,0
2016-01-02 20:00:00,1,0
2016-01-02 21:00:00,0,0
2016-01-02 22:00:00,0,0
2016-01-02 23:00:00,0,0

Desired output:期望的输出:

date,value1,value2
2016-01-01 09:00:00,14,14
2016-01-01 10:00:00,12,13
2016-01-01 11:00:00,11,13
2016-01-01 12:00:00,11,9
2016-01-01 13:00:00,17,21
2016-01-01 14:00:00,9,22
2016-01-01 15:00:00,10,9
2016-01-01 16:00:00,11,9
2016-01-01 17:00:00,8,8
2016-01-01 18:00:00,4,2
2016-01-01 19:00:00,5,7
2016-01-01 20:00:00,5,5
2016-01-01 21:00:00,3,4
2016-01-01 22:00:00,2,4
2016-01-01 23:00:00,2,4
2016-01-02 09:00:00,null,null
2016-01-02 10:00:00,null,null
2016-01-02 11:00:00,null,null
2016-01-02 12:00:00,null,null
2016-01-02 13:00:00,null,null
2016-01-02 14:00:00,null,null
2016-01-02 15:00:00,null,null
2016-01-02 16:00:00,null,null
2016-01-02 17:00:00,null,null
2016-01-02 18:00:00,null,null
2016-01-02 19:00:00,null,null
2016-01-02 20:00:00,null,null
2016-01-02 21:00:00,null,null
2016-01-02 22:00:00,null,null
2016-01-02 23:00:00,null,null

I have read this question: pandas apply function to data grouped by day and tried to follow:我已经阅读了这个问题: pandas apply function to data grouped by day并尝试遵循:

df_mode = df.groupby(df.index.date).apply(lambda x: mode(x)[0])

I got the most frequent value for each day in each columns.我在每一列中获得了每天最频繁的值。 However I don't know how to process the next step (set all value in the column for that day into np.nan )但是我不知道如何处理下一步(将当天列中的所有值设置为np.nan

And is there any more efficient way than using apply in this case?在这种情况下,还有比使用apply更有效的方法吗?

Thank you谢谢

Use GroupBy.transform with compare values by 0 and mean for percentages and then set minssing values by DataFrame.mask :使用GroupBy.transform0比较值和百分比mean ,然后通过DataFrame.mask设置 minssing 值:

df = df.mask(df.eq(0).groupby(df.index.date).transform('mean').gt(.5))
print (df)
                     value1  value2
date                               
2016-01-01 09:00:00    14.0    14.0
2016-01-01 10:00:00    12.0    13.0
2016-01-01 11:00:00    11.0    13.0
2016-01-01 12:00:00    11.0     9.0
2016-01-01 13:00:00    17.0    21.0
2016-01-01 14:00:00     9.0    22.0
2016-01-01 15:00:00    10.0     9.0
2016-01-01 16:00:00    11.0     9.0
2016-01-01 17:00:00     8.0     8.0
2016-01-01 18:00:00     4.0     2.0
2016-01-01 19:00:00     5.0     7.0
2016-01-01 20:00:00     5.0     5.0
2016-01-01 21:00:00     3.0     4.0
2016-01-01 22:00:00     2.0     4.0
2016-01-01 23:00:00     2.0     4.0
2016-01-02 09:00:00     NaN     NaN
2016-01-02 10:00:00     NaN     NaN
2016-01-02 11:00:00     NaN     NaN
2016-01-02 12:00:00     NaN     NaN
2016-01-02 13:00:00     NaN     NaN
2016-01-02 14:00:00     NaN     NaN
2016-01-02 15:00:00     NaN     NaN
2016-01-02 16:00:00     NaN     NaN
2016-01-02 17:00:00     NaN     NaN
2016-01-02 18:00:00     NaN     NaN
2016-01-02 19:00:00     NaN     NaN
2016-01-02 20:00:00     NaN     NaN
2016-01-02 21:00:00     NaN     NaN
2016-01-02 22:00:00     NaN     NaN
2016-01-02 23:00:00     NaN     NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 熊猫将功能应用于按天分组的数据 - pandas apply function to data grouped by day 导入功能并将功能应用于熊猫数据框中的文本数据的更有效方法 - More efficient way to import and apply a function to text data in a Pandas Dataframe 将功能应用于熊猫中的分组数据计数 - Apply function to grouped data counts in pandas 熊猫:if和条件应用功能未返回任何数据 - Pandas: None data returned with if and conditional apply function 熊猫:在整个数据框架上应用复杂功能的最有效方法 - Pandas: most efficient way to apply complex function over entire data frame 如何在 pandas 中的分组数据上按列应用用户定义的 function - how to apply a user defined function column wise on grouped data in pandas 使用Pandas以更有效的方式在后续行之间应用函数 - Apply function between subsequent rows in more efficient way with Pandas 迭代熊猫数据框并应用条件函数的更快方法 - Quicker way to iterate pandas dataframe and apply a conditional function 是否有更简洁的方法来应用需要多个数据框列到分组数据的函数? - Is there a cleaner way to apply a function that requires multiple dataframe columns to grouped data? 正确将自己的函数应用于分组的熊猫数据框 - Correct apply own function to grouped pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM