I have a dataframe:
Name StartPoint EndPoint isDelivered Customer
0 A 1 4 0 C1
1 A 1 4 0 C1
2 A 2 5 1 C1
3 A 3 5 0 C1
4 A 3 6 0 C1
5 A 3 6 1 C1
6 B 1 4 0 C2
7 B 1 4 0 C2
8 B 2 5 1 C2
9 B 3 5 1 C2
10 B 3 6 1 C2
11 B 3 8 0 C2
12 B 3 8 1 C2
I want to group by Name
and each group should have rows that satisfies the following conditions:
StartPoint
EndPoint
and value 1 in column isDelivered
This is what I have done:
groups = df.groupby(['Name']).StartPoint
groups1 = df.groupby(['Name']).EndPoint
min_StartPoint = groups.transform(min)
max_EndPoint = groups1.transform(max)
df1 = df[(df.StartPoint==min_StartPoint)|(df.EndPoint==max_EndPoint)]
The result obtained is:
Name StartPoint EndPoint isDelivered Customer
0 A 1 4 0 C1
1 A 1 4 0 C1
4 A 3 6 0 C1
5 A 3 6 1 C1
6 B 1 4 0 C2
7 B 1 4 0 C2
11 B 3 8 0 C2
12 B 3 8 1 C2
But the rows 4 and 11 do not have value 1 in isDelivered
and hence they are not satisfying the second condition.
My desired result is:
Name StartPoint EndPoint isDelivered Customer
0 A 1 4 0 C1 # Min value in StartPoint
1 A 1 4 0 C1 # Min value in StartPoint
5 A 3 6 1 C1 # Max value in EndPoint and 1 in isDelivered
6 B 1 4 0 C2 # Min value in StartPoint
7 B 1 4 0 C2 # Min value in StartPoint
12 B 3 8 1 C2 # Max value in EndPoint and 1 in isDelivered
Is there a way to achieve this using my current solution?
You never incorporated both clauses of the second condition. Make your code reflect the wording as given: the saved rows must match one of the two conditions:
df1 = df[(df.StartPoint == min_StartPoint) |
((df.EndPoint == max_EndPoint) & df.isDelivered == 1)]
I'd be happy to show you actual output, but you failed to provide the expected see MRE - Minimal, Reproducible Example .
You can apply a filter to each group:
df.groupby(['Name'], group_keys=False).apply(
lambda g:g[(g.StartPoint == g.StartPoint.min()) |
((g.EndPoint == g.EndPoint.max()) & (g.isDelivered == 1))])
Output:
Name StartPoint EndPoint isDelivered Customer
0 A 1 4 0 C1
1 A 1 4 0 C1
5 A 3 6 1 C1
6 B 1 4 0 C2
7 B 1 4 0 C2
12 B 3 8 1 C2
You are on the right track, you just need to add an extra condition to the last line of code:
df1 = df[(df.StartPoint==min_StartPoint)|((df.EndPoint==max_EndPoint)
& (df.isDelivered == 1))]
Name StartPoint EndPoint isDelivered Customer
0 A 1 4 0 C1
1 A 1 4 0 C1
5 A 3 6 1 C1
6 B 1 4 0 C2
7 B 1 4 0 C2
12 B 3 8 1 C2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.