简体   繁体   中英

Count consecutive values above threshold for all columns in dataframe

I have the following df:

       A      B      C
0     47     50     22
1     47     53     28
2     47     65     33
3     47     62     33
4     47     62     33
5     47     62     33
6     47     62     33
7     47     62     33

I'd like to create an array with the amount of times that any value repeats itself more than 5 times and sum it. In the above example the output that I'd like it would be:

output = [8 5 6]

Is there a way to do it using pandas methods? I almost got the correct output but using for loops and it takes too long since I have +1000 columns and almost 7000 rows.

Is this what you need ?

s=df.melt()
s.value.value_counts().loc[lambda x : x>=5]
47    8
33    6
62    5
Name: value, dtype: int64

Update

s=df.melt()
s.groupby(list(s)).size().loc[lambda x : x>=5]
variable  value
A         47       8
B         62       5
C         33       6
dtype: int64

IIUC, you want to find the consecutive blocks within each column and add their sizes up:

# compare cells with previous 
# and stack into a series
s = df.eq(df.shift()).melt()

# groups of consecutive equal values
groups = (~s['value']).groupby(s['variable']).cumsum()

(s.groupby(['variable', groups])
  .size()
  .loc[lambda x: x>=5]          # stole from BEN's answer
  .groupby(level=0).sum()       # replace this line with sum(level=0) would also work
)

Output:

variable
A    8
B    5
C    6
dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM