Lets say I have a dataframe like this:
Time A B C D
2019-06-17 08:45:00 12089.89 12089.89 12087.71 12087.71
2019-06-17 08:46:00 NaN NaN 12087.71 12087.91
2019-06-17 08:47:00 NaN 12088.21 12084.21 12085.21
2019-06-17 08:48:00 NaN 12090.21 NaN NaN
2019-06-17 08:49:00 NaN 12090.21 NaN NaN
2019-06-17 08:50:00 NaN NaN 12504.11 NaN
2019-06-17 08:51:00 NaN NaN 12503.11 12503.11
2019-06-17 08:52:00 12504.11 NaN 12503.11 12503.11
2019-06-17 08:53:00 12503.61 12503.61 12503.61 12503.61
2019-06-17 08:54:00 12503.61 12503.61 12503.11 12503.11
How to find the length of the longest uninteruppted NaN sequence in the whole df? ( in the example its 6 ) efficiently?
EDIT: forgot to emphasize the word "efficiently", because the df is about 1mio rows long
Let's try apply
a user-defined function, which in turns uses cumsum()
to identify blocks:
def max_na(s):
isna = s.isna()
blocks = (~isna).cumsum()
return isna.groupby(blocks).sum().max()
df.apply(max_na).max()
# 6.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.