简体   繁体   English

删除少于 x 个连续非 NaN 值的行

[英]Removing rows with less than x consecutive non NaN values

I have a data frame with a time_date index and 29 columns of data.我有一个带有 time_date 索引和 29 列数据的数据框。 The data frame has isolated Non-NaN values, which I want to remove (or convert into NaN) and only keep rows with, let's say at least 5 consecutive values.数据框具有隔离的非 NaN 值,我想将其删除(或转换为 NaN)并且只保留行,假设至少 5 个连续值。 For example, initially, I have:例如,最初,我有:

Date日期 A一个 B C C D D
18/01/2018 7:00 18/01/2018 7:00 NaN NaN 3.493148804 3.493148804 -3.861461957 -3.861461957
19/01/2018 6:00 19/01/2018 6:00 0.000643658 0.000643658 NaN 4.493148804 4.493148804 -3.861461957 -3.861461957
19/01/2018 7:00 19/01/2018 7:00 0.003109299 0.003109299 NaN 7.247741699 7.247741699 -4.749528885 -4.749528885
19/01/2018 8:00 19/01/2018 8:00 0.003109299 0.003109299 -0.031979417 -0.031979417 NaN -3.726334095 -3.726334095
19/01/2018 9:00 19/01/2018 9:00 0.003109299 0.003109299 -0.031979417 -0.031979417 NaN 0.13656346 0.13656346
19/01/2018 10:00 19/01/2018 10:00 NaN -0.031979417 -0.031979417 NaN 2.823025544 2.823025544
19/01/2018 11:00 19/01/2018 11:00 NaN -0.031979417 -0.031979417 NaN 3.529650052 3.529650052
19/01/2018 12:00 19/01/2018 12:00 NaN -0.038014129 -0.038014129 0.006496742 0.006496742 4.243628979 4.243628979
19/01/2018 13:00 19/01/2018 13:00 -0.003737779 -0.003737779 NaN -0.003895367 -0.003895367 5.595969041 5.595969041
19/01/2018 14:00 19/01/2018 14:00 -0.003999399 -0.003999399 NaN -0.013323511 -0.013323511 6.294107278 6.294107278
19/01/2018 15:00 19/01/2018 15:00 -0.003999399 -0.003999399 2.823025544 2.823025544 -0.026859129 -0.026859129 5.231494427 5.231494427
19/01/2018 16:00 19/01/2018 16:00 -0.003999399 -0.003999399 3.529650052 3.529650052 -0.031979417 -0.031979417 5.075140158 5.075140158
19/01/2018 17:00 19/01/2018 17:00 -0.003999399 -0.003999399 NaN -0.038014129 -0.038014129 4.057830334 4.057830334
19/01/2018 18:00 19/01/2018 18:00 -0.003999399 -0.003999399 NaN NaN 4.384686947 4.384686947

What I want is something like this:我想要的是这样的:

Date日期 A一个 B C C D D
18/01/2018 7:00 18/01/2018 7:00 NaN NaN NaN -3.861461957 -3.861461957
19/01/2018 6:00 19/01/2018 6:00 NaN NaN NaN -3.861461957 -3.861461957
19/01/2018 7:00 19/01/2018 7:00 NaN NaN NaN -4.749528885 -4.749528885
19/01/2018 8:00 19/01/2018 8:00 NaN -0.031979417 -0.031979417 NaN -3.726334095 -3.726334095
19/01/2018 9:00 19/01/2018 9:00 NaN -0.031979417 -0.031979417 NaN 0.13656346 0.13656346
19/01/2018 10:00 19/01/2018 10:00 NaN -0.031979417 -0.031979417 NaN 2.823025544 2.823025544
19/01/2018 11:00 19/01/2018 11:00 NaN -0.031979417 -0.031979417 NaN 3.529650052 3.529650052
19/01/2018 12:00 19/01/2018 12:00 NaN -0.038014129 -0.038014129 0.006496742 0.006496742 4.243628979 4.243628979
19/01/2018 13:00 19/01/2018 13:00 -0.003737779 -0.003737779 NaN -0.003895367 -0.003895367 5.595969041 5.595969041
19/01/2018 14:00 19/01/2018 14:00 -0.003999399 -0.003999399 NaN -0.013323511 -0.013323511 6.294107278 6.294107278
19/01/2018 15:00 19/01/2018 15:00 -0.003999399 -0.003999399 NaN -0.026859129 -0.026859129 5.231494427 5.231494427
19/01/2018 16:00 19/01/2018 16:00 -0.003999399 -0.003999399 NaN -0.031979417 -0.031979417 5.075140158 5.075140158
19/01/2018 17:00 19/01/2018 17:00 -0.003999399 -0.003999399 NaN -0.038014129 -0.038014129 4.057830334 4.057830334
19/01/2018 18:00 19/01/2018 18:00 -0.003999399 -0.003999399 NaN NaN 4.384686947 4.384686947

The real number of required consective rows is 24. So any non NaN values less than 24 are converted into NaN.所需连续行的实数为 24。因此,任何小于 24 的非 NaN 值都将转换为 NaN。 In other words, I only want episodes having length of at least 24 in my data.换句话说,我只想要我的数据中长度至少为 24 的剧集。 Any help would be much appreciated.任何帮助将非常感激。 Thank you谢谢

Use a custom function to mask values is the consecutive non-NA are less than N, then apply to all columns:使用自定义 function 来屏蔽值是连续的非 NA 小于 N,然后apply所有列:

def mask_below(s, N=5):
    m1 = s.isna()
    m2 = s.groupby(m1.cumsum()).transform('count').ge(N)
    return s.where(m1|m2)

df.set_index('Date').apply(mask_below).reset_index()

output: output:

                Date         A         B         C         D
0    18/01/2018 7:00       NaN       NaN       NaN -3.861462
1    19/01/2018 6:00       NaN       NaN       NaN -3.861462
2    19/01/2018 7:00       NaN       NaN       NaN -4.749529
3    19/01/2018 8:00       NaN -0.031979       NaN -3.726334
4    19/01/2018 9:00       NaN -0.031979       NaN  0.136563
5   19/01/2018 10:00       NaN -0.031979       NaN  2.823026
6   19/01/2018 11:00       NaN -0.031979       NaN  3.529650
7   19/01/2018 12:00       NaN -0.038014  0.006497  4.243629
8   19/01/2018 13:00 -0.003738       NaN -0.003895  5.595969
9   19/01/2018 14:00 -0.003999       NaN -0.013324  6.294107
10  19/01/2018 15:00 -0.003999       NaN -0.026859  5.231494
11  19/01/2018 16:00 -0.003999       NaN -0.031979  5.075140
12  19/01/2018 17:00 -0.003999       NaN -0.038014  4.057830
13  19/01/2018 18:00 -0.003999       NaN       NaN  4.384687

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM