I want to cut the continues data into some group. I have some data like this:
Index Age Predict
0 23 0
1 39 0
2 70 0
3 41 1
4 50 0
5 17 0
6 29 1
I try:
df_1 = df[['Age','Predict']]
data = df_1.sort_values(by='Age')
After sorting:
Index Age Predict
5 17 0
0 23 0
6 29 1
1 39 0
3 41 1
2 70 0
4 50 0
What can i do to classifier data into the group:
Index Age Predict
group 1:
5 17 0
0 23 0
group 2:
6 29 1
group 3:
1 39 0
group 4:
3 41 1
group 5:
2 70 0
4 50 0
Thanks for help.
IIUC, the groups you want are created from Predict, where diff
between following rows are not equal to 0. so you could create a column:
data_ = df.sort_values('Age')
data_['gr'] = data_['Predict'].diff().ne(0).cumsum()
print (data_)
Index Age Predict gr
5 5 17 0 1
0 0 23 0 1
6 6 29 1 2
1 1 39 0 3
3 3 41 1 4
4 4 50 0 5
2 2 70 0 5
Or if you want to split your data and not create the group column, one way is to create a dictionary that contains each group
data_ = df.sort_values('Age')
d = {i: dfg
for i,(_, dfg) in enumerate(data_.groupby(data_['Predict'].diff().ne(0).cumsum()),1)}
print (d[1])
Index Age Predict
5 5 17 0
0 0 23 0
df.groupby((df['Predict'] != df['Predict'].shift(1)).cumsum())
Basically check if the current value is not the same previous value, if not increment. This will allow you to group by the change in values of Predict
Using .grouby
and .cumsum()
for i, grp in data.groupby([(data['Predict'] != data['Predict'].shift()).cumsum()]):
print('group', i)
print(grp)
Result:
group 1
Age Predict
5 17 0
0 23 0
group 2
Age Predict
6 29 1
group 3
Age Predict
1 39 0
group 4
Age Predict
3 41 1
group 5
Age Predict
4 50 0
2 70 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.