[英]Removing rows with less than x consecutive non NaN values
I have a data frame with a time_date index and 29 columns of data.我有一个带有 time_date 索引和 29 列数据的数据框。 The data frame has isolated Non-NaN values, which I want to remove (or convert into NaN) and only keep rows with, let's say at least 5 consecutive values.
数据框具有隔离的非 NaN 值,我想将其删除(或转换为 NaN)并且只保留行,假设至少 5 个连续值。 For example, initially, I have:
例如,最初,我有:
Date![]() |
A![]() |
B![]() |
C ![]() |
D ![]() |
---|---|---|---|---|
18/01/2018 7:00 ![]() |
NaN![]() |
NaN![]() |
3.493148804 ![]() |
-3.861461957 ![]() |
19/01/2018 6:00 ![]() |
0.000643658 ![]() |
NaN![]() |
4.493148804 ![]() |
-3.861461957 ![]() |
19/01/2018 7:00 ![]() |
0.003109299 ![]() |
NaN![]() |
7.247741699 ![]() |
-4.749528885 ![]() |
19/01/2018 8:00 ![]() |
0.003109299 ![]() |
-0.031979417 ![]() |
NaN![]() |
-3.726334095 ![]() |
19/01/2018 9:00 ![]() |
0.003109299 ![]() |
-0.031979417 ![]() |
NaN![]() |
0.13656346 ![]() |
19/01/2018 10:00 ![]() |
NaN![]() |
-0.031979417 ![]() |
NaN![]() |
2.823025544 ![]() |
19/01/2018 11:00 ![]() |
NaN![]() |
-0.031979417 ![]() |
NaN![]() |
3.529650052 ![]() |
19/01/2018 12:00 ![]() |
NaN![]() |
-0.038014129 ![]() |
0.006496742 ![]() |
4.243628979 ![]() |
19/01/2018 13:00 ![]() |
-0.003737779 ![]() |
NaN![]() |
-0.003895367 ![]() |
5.595969041 ![]() |
19/01/2018 14:00 ![]() |
-0.003999399 ![]() |
NaN![]() |
-0.013323511 ![]() |
6.294107278 ![]() |
19/01/2018 15:00 ![]() |
-0.003999399 ![]() |
2.823025544 ![]() |
-0.026859129 ![]() |
5.231494427 ![]() |
19/01/2018 16:00 ![]() |
-0.003999399 ![]() |
3.529650052 ![]() |
-0.031979417 ![]() |
5.075140158 ![]() |
19/01/2018 17:00 ![]() |
-0.003999399 ![]() |
NaN![]() |
-0.038014129 ![]() |
4.057830334 ![]() |
19/01/2018 18:00 ![]() |
-0.003999399 ![]() |
NaN![]() |
NaN![]() |
4.384686947 ![]() |
What I want is something like this:我想要的是这样的:
Date![]() |
A![]() |
B![]() |
C ![]() |
D ![]() |
---|---|---|---|---|
18/01/2018 7:00 ![]() |
NaN![]() |
NaN![]() |
NaN![]() |
-3.861461957 ![]() |
19/01/2018 6:00 ![]() |
NaN![]() |
NaN![]() |
NaN![]() |
-3.861461957 ![]() |
19/01/2018 7:00 ![]() |
NaN![]() |
NaN![]() |
NaN![]() |
-4.749528885 ![]() |
19/01/2018 8:00 ![]() |
NaN![]() |
-0.031979417 ![]() |
NaN![]() |
-3.726334095 ![]() |
19/01/2018 9:00 ![]() |
NaN![]() |
-0.031979417 ![]() |
NaN![]() |
0.13656346 ![]() |
19/01/2018 10:00 ![]() |
NaN![]() |
-0.031979417 ![]() |
NaN![]() |
2.823025544 ![]() |
19/01/2018 11:00 ![]() |
NaN![]() |
-0.031979417 ![]() |
NaN![]() |
3.529650052 ![]() |
19/01/2018 12:00 ![]() |
NaN![]() |
-0.038014129 ![]() |
0.006496742 ![]() |
4.243628979 ![]() |
19/01/2018 13:00 ![]() |
-0.003737779 ![]() |
NaN![]() |
-0.003895367 ![]() |
5.595969041 ![]() |
19/01/2018 14:00 ![]() |
-0.003999399 ![]() |
NaN![]() |
-0.013323511 ![]() |
6.294107278 ![]() |
19/01/2018 15:00 ![]() |
-0.003999399 ![]() |
NaN![]() |
-0.026859129 ![]() |
5.231494427 ![]() |
19/01/2018 16:00 ![]() |
-0.003999399 ![]() |
NaN![]() |
-0.031979417 ![]() |
5.075140158 ![]() |
19/01/2018 17:00 ![]() |
-0.003999399 ![]() |
NaN![]() |
-0.038014129 ![]() |
4.057830334 ![]() |
19/01/2018 18:00 ![]() |
-0.003999399 ![]() |
NaN![]() |
NaN![]() |
4.384686947 ![]() |
The real number of required consective rows is 24. So any non NaN values less than 24 are converted into NaN.所需连续行的实数为 24。因此,任何小于 24 的非 NaN 值都将转换为 NaN。 In other words, I only want episodes having length of at least 24 in my data.
换句话说,我只想要我的数据中长度至少为 24 的剧集。 Any help would be much appreciated.
任何帮助将非常感激。 Thank you
谢谢
Use a custom function to mask values is the consecutive non-NA are less than N, then apply
to all columns:使用自定义 function 来屏蔽值是连续的非 NA 小于 N,然后
apply
所有列:
def mask_below(s, N=5):
m1 = s.isna()
m2 = s.groupby(m1.cumsum()).transform('count').ge(N)
return s.where(m1|m2)
df.set_index('Date').apply(mask_below).reset_index()
output: output:
Date A B C D
0 18/01/2018 7:00 NaN NaN NaN -3.861462
1 19/01/2018 6:00 NaN NaN NaN -3.861462
2 19/01/2018 7:00 NaN NaN NaN -4.749529
3 19/01/2018 8:00 NaN -0.031979 NaN -3.726334
4 19/01/2018 9:00 NaN -0.031979 NaN 0.136563
5 19/01/2018 10:00 NaN -0.031979 NaN 2.823026
6 19/01/2018 11:00 NaN -0.031979 NaN 3.529650
7 19/01/2018 12:00 NaN -0.038014 0.006497 4.243629
8 19/01/2018 13:00 -0.003738 NaN -0.003895 5.595969
9 19/01/2018 14:00 -0.003999 NaN -0.013324 6.294107
10 19/01/2018 15:00 -0.003999 NaN -0.026859 5.231494
11 19/01/2018 16:00 -0.003999 NaN -0.031979 5.075140
12 19/01/2018 17:00 -0.003999 NaN -0.038014 4.057830
13 19/01/2018 18:00 -0.003999 NaN NaN 4.384687
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.