简体   繁体   中英

Pandas: Groupby and drop rows in each group based on conditon

I have a dataframe:

   Name  StartPoint  EndPoint  isDelivered Customer
0     A           1         4            0       C1
1     A           1         4            0       C1
2     A           2         5            1       C1
3     A           3         5            0       C1
4     A           3         6            0       C1
5     A           3         6            1       C1
6     B           1         4            0       C2
7     B           1         4            0       C2
8     B           2         5            1       C2
9     B           3         5            1       C2
10    B           3         6            1       C2
11    B           3         8            0       C2
12    B           3         8            1       C2

I want to group by Name and each group should have rows that satisfies the following conditions:

  1. Minimum value in column StartPoint
  2. Maximum value in column EndPoint and value 1 in column isDelivered

This is what I have done:

groups = df.groupby(['Name']).StartPoint
groups1 = df.groupby(['Name']).EndPoint
min_StartPoint = groups.transform(min)
max_EndPoint = groups1.transform(max)
df1 = df[(df.StartPoint==min_StartPoint)|(df.EndPoint==max_EndPoint)]

The result obtained is:

   Name  StartPoint  EndPoint  isDelivered Customer
0     A           1         4            0       C1
1     A           1         4            0       C1
4     A           3         6            0       C1
5     A           3         6            1       C1
6     B           1         4            0       C2
7     B           1         4            0       C2
11    B           3         8            0       C2
12    B           3         8            1       C2

But the rows 4 and 11 do not have value 1 in isDelivered and hence they are not satisfying the second condition.

My desired result is:

   Name  StartPoint  EndPoint  isDelivered Customer
0     A           1         4            0       C1 # Min value in StartPoint
1     A           1         4            0       C1 # Min value in StartPoint
5     A           3         6            1       C1 # Max value in EndPoint and 1 in isDelivered
6     B           1         4            0       C2 # Min value in StartPoint
7     B           1         4            0       C2 # Min value in StartPoint
12    B           3         8            1       C2 # Max value in EndPoint and 1 in isDelivered

Is there a way to achieve this using my current solution?

You never incorporated both clauses of the second condition. Make your code reflect the wording as given: the saved rows must match one of the two conditions:

df1 = df[(df.StartPoint == min_StartPoint) |
         ((df.EndPoint  == max_EndPoint) & df.isDelivered == 1)]

I'd be happy to show you actual output, but you failed to provide the expected see MRE - Minimal, Reproducible Example .

You can apply a filter to each group:

df.groupby(['Name'], group_keys=False).apply(
    lambda g:g[(g.StartPoint == g.StartPoint.min()) |
               ((g.EndPoint == g.EndPoint.max()) & (g.isDelivered == 1))])

Output:

    Name    StartPoint  EndPoint    isDelivered Customer
0   A       1           4           0           C1
1   A       1           4           0           C1
5   A       3           6           1           C1
6   B       1           4           0           C2
7   B       1           4           0           C2
12  B       3           8           1           C2

You are on the right track, you just need to add an extra condition to the last line of code:

df1 = df[(df.StartPoint==min_StartPoint)|((df.EndPoint==max_EndPoint) 
                                         & (df.isDelivered == 1))]
 
   Name  StartPoint  EndPoint  isDelivered Customer
0     A           1         4            0       C1
1     A           1         4            0       C1
5     A           3         6            1       C1
6     B           1         4            0       C2
7     B           1         4            0       C2
12    B           3         8            1       C2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM