I've got a dataframe - and I want to drop specific rows per group ("id"):
id - month - max
1 - 112016 - 41
1 - 012017 - 46
1 - 022017 - 156
1 - 032017 - 164
1 - 042017 - 51
2 - 042017 - 26
2 - 052017 - 156
2 - 062017 - 17
Expected result:
id - month - max
1 - 112016 - 41
1 - 012017 - 46
2 - 042017 - 26
I'm able to identify the first row which has to be deleted per group, but I'm stuck from that point on:
df[df.max > 62].sort_values(['month'], ascending=[True]).groupby('id', as_index=False).first()
How can I get rid of the rows?
Best regards, david
Use:
#convert to datetimes
df['month'] = pd.to_datetime(df['month'], format='%m%Y')
#sorting per groups if necessary
df = df.sort_values(['id','month'])
#comopare by gt (>) for cumulative sum per groups and filter equal 0
df1= df[df['max'].gt(62).groupby(df['id']).cumsum().eq(0)]
print (df1)
id month max
0 1 2016-11-01 41
1 1 2017-01-01 46
Or use a custom function if need also first value >62
:
#convert to datetimes
df['month'] = pd.to_datetime(df['month'], format='%m%Y')
#sorting per groups if necessary
df = df.sort_values(['id','month'])
def f(x):
m = x['max'].gt(62)
first = m[m].index[0]
x = x.loc[ :first]
return x
df = df.groupby('id', group_keys=False).apply(f)
print (df)
id month max
0 1 2016-11-01 41
1 1 2017-01-01 46
2 1 2017-02-01 156
5 2 2017-04-01 83
import pandas as pd
datadict = {
'id': [1,1,1,1,1,2,2,2],
'max': [41,46,156,164,51,83,156,17],
'month': ['112016', '012017', '022017', '032017', '042017', '042017', '052017', '062017'],
}
df = pd.DataFrame(datadict)
print (df)
id max month
0 1 41 112016
1 1 46 012017
2 1 156 022017
3 1 164 032017
4 1 51 042017
5 2 83 042017
6 2 156 052017
7 2 17 062017
df = df.loc[df['max']>62,:]
print (df)
id max month
2 1 156 022017
3 1 164 032017
5 2 83 042017
6 2 156 052017
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.