简体   繁体   中英

How to count consecutive string values of one column grouped by column values of another in a dataframe?

I have the following dataframe:


|Levels|Labels|Confidence|
|----------------------------
|0.    | Hands |  0.8
|0     |Leg    |  0.7    
|0     |Eye.   | 0.9
|1     |Ear    |0.9
|1     |Eye.   |0.8
|2     |Hands  |0.9
|2     |Eye.   |0.8
|3.    |Eye.   |0.8
:
:
: 

I want to check if any of my labels are consecutively present in my levels (0,1,2,3,4,5..)and for how many consecutive levels (count of such consecutive levels for each of my bodyparts). Here is my example dataset, you can see that the label "Eye" is consecutively present for 4 levels, "Hands" for 1..etc.

There is a similar question here: How to find the count of consecutive same string values in a pandas dataframe?
Modifying this solution there did not work for me. I also tried to convert this into a NumPy array which also did not work.

Could you take a look at this?

This should work. Just define custom aggregating function.

import pandas as pd

df = pd.DataFrame({
    'lvl': [0, 0, 0, 1, 1, 2, 2, 3, 3, 3, 4],
    'label': ['a', 'b', 'c', 'a', 'b', 'a', 'c', 'a', 'b', 'c', 'c'],
    'confidence': [0.1, 0.5, 0.3, 0.6, 0.2, 0.4, 0.7, 0.8, 0.5, 0.2, 0.8]
})


agg_func = {
    'lvl': [('length', lambda x: x.ne((x+1).shift()).cumsum().value_counts().max())]
}

result = df.groupby('label').agg(agg_func)
result.columns = result.columns.droplevel(0)

print(result)
       length
label        
a           4
b           2
c           3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM