I have a dataframe like this:
id date event name time
1 2016-10-01 A leader 12:45
2 2016-10-01 A AA 12:87
3 2016-10-01 A BB 12:45
There are rows for each member in the event, but one row has the leader data as well. I want to exclude the rows with the data about the leader and add a column is_leader
to indicate whether a member is the leader or not. Something like this:
id date event name time is_leader
2 2016-10-01 A AA 12:87 0
3 2016-10-01 A BB 12:45 1
So, I know at id=3
is the leader based on the time, which is 12:45 for both here. We can assume that this time won't be the same for any other members.
What is an efficient way to accomplish this in pandas. Here I have just one event as an example, but I'll have several of these and I need to do this for each event.
You can use groupby
with custom function f
which return new column is_leader
with True
for all rows where is same time
as time
of row with text leader
in column name
:
print (df)
id date event name time
0 1 2016-10-01 A leader 12:45
1 2 2016-10-01 A AA 12:87
2 3 2016-10-01 A BB 12:45
3 1 2016-10-01 B leader 12:15
4 2 2016-10-01 B AA 12:15
5 3 2016-10-01 B BB 12:45
def f(x):
x['is_leader'] = x.time == x.ix[x['name'] == 'leader', 'time'].iloc[0]
return x
df= df.groupby('event').apply(f)
print (df)
id date event name time is_leader
0 1 2016-10-01 A leader 12:45 True
1 2 2016-10-01 A AA 12:87 False
2 3 2016-10-01 A BB 12:45 True
3 1 2016-10-01 B leader 12:15 True
4 2 2016-10-01 B AA 12:15 True
5 3 2016-10-01 B BB 12:45 False
One row solution with lambda function:
df['is_leader'] = df.groupby('event')
.apply(lambda x: x.time == x.ix[x['name'] == 'leader', 'time'].iloc[0])
.reset_index(drop=True, level=0)
print (df)
id date event name time is_leader
0 1 2016-10-01 A leader 12:45 True
1 2 2016-10-01 A AA 12:87 False
2 3 2016-10-01 A BB 12:45 True
3 1 2016-10-01 B leader 12:15 True
4 2 2016-10-01 B AA 12:15 True
5 3 2016-10-01 B BB 12:45 False
Then remove rows with leader
by boolean indexing
and cast boolean
column to int
:
df = df[df.name != 'leader']
df.is_leader = df.is_leader.astype(int)
print (df)
id date event name time is_leader
1 2 2016-10-01 A AA 12:87 0
2 3 2016-10-01 A BB 12:45 1
4 2 2016-10-01 B AA 12:15 1
5 3 2016-10-01 B BB 12:45 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.