I am quite new to Pandas, I am trying to count the total of the first consecutive instances of color from this DataFrame
car color
0 audi black
1 audi black
2 audi blue
3 audi black
4 bmw blue
5 bmw green
6 bmw blue
7 bmw blue
8 fiat green
9 fiat green
10 fiat green
11 fiat blue
Thanks to jezrael I have it so it counts the cumulative number of times the first color appears with this :
import pandas as pd
df = pd.DataFrame(data={
'car': ['audi', 'audi', 'audi', 'audi', 'bmw', 'bmw', 'bmw', 'bmw', 'fiat', 'fiat', 'fiat', 'fiat'],'color': ['black', 'black', 'blue', 'black', 'blue', 'green', 'blue', 'blue', 'green', 'green', 'green', 'blue']
})
df1 = (df.groupby('car')['color']
.transform('first')
.eq(df['color'])
.view('i1')
.groupby(df['car'])
.sum()
.reset_index(name='colour_cars'))
print(df1)
And it works well for counting the total
car colour_cars
0 audi 3
1 bmw 3
2 fiat 3
But it turns out what I really need is to count the first consecutive sum, so it should be
car colour_cars
0 audi 2
1 bmw 1
2 fiat 3
I have tried to use an apply function to stop the series .sum()
if a False is encounter by .eq
, any help to find a way to break the count once a False is returned from the .eq
would be greatly appreciated.
Use:
df = (df.groupby(['car', df.color.ne(df.color.shift()).cumsum()])
.size()
.reset_index(level=1, drop=True)
.reset_index(name='colour_cars')
.drop_duplicates('car'))
print (df)
car colour_cars
0 audi 2
3 bmw 1
6 fiat 3
Details :
Create helper consecutive Series
for test consecutive values of color
column, pass to GroupBy.size
, remove first level created from helper function by DataFrame.reset_index
, convert index to columns by second reset_index
and last get first rows per cars by DataFrame.drop_duplicates
:
print (df.color.ne(df.color.shift()).cumsum())
0 1
1 1
2 2
3 3
4 4
5 5
6 6
7 6
8 7
9 7
10 7
11 8
Name: color, dtype: int32
Here is a slightly different approach:
# get group ids based on whether the car or the color changes from one row to the next
df = df.assign(group_id=(df.shift(1) != df).any(axis=1).cumsum())
# group and get len of consecutive identical pairs
df = df.join(df.groupby('group_id').apply(len).rename('consec_len'), on='group_id')
# select first length for each car
df1.groupby('car').consec_len.first()
df1
# returns
car
audi 2
bmw 1
fiat 3
Name: consec_len, dtype: int64
You could do:
# group by car and consecutive group of colors (compute count)
counts = df.groupby(['car', df.color.ne(df.color.shift()).cumsum()], as_index=False).count()
# fetch only the count corresponding to the first consecutive group of colors
result = counts[~counts.car.duplicated()].rename(columns={'color' : 'colour_cars'})
print(result)
Output
car colour_cars
0 audi 2
3 bmw 1
6 fiat 3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.