Pandas drop row based on groupby AND partial string match

Question

I have a large pandas DataFrame with numerous columns. I want to group by serial number AND where there are duplicates to keep the row where the product ID ends in -RF . The first part I can achieve with a groupby(subset='Serial Number') , however I'm at a loss as to how combine this and keep/drop row based on a regex ('-RF$') . How can I achieve this?

Input:

Serial Number	Product ID
ABC1745AABC	ABC-SUP2E-RF
ABC1745AABC	ABC-SUP2E
ABC1745AAFF	ABC-SUP2E
ABC1745AAFE	ABC-SUP2E
ABC1745AAB1	ABC-SUP2E-WS
ABC1745AAB1	ABC-SUP2E

Ultimately, I want to be left with something like this (output):

Serial Number	Product ID
ABC1745AABC	ABC-SUP2E-RF
ABC1745AAFF	ABC-SUP2E
ABC1745AAFE	ABC-SUP2E
ABC1745AAB1	ABC-SUP2E-WS
ABC1745AAB1	ABC-SUP2E

Data:

{'Serial Number': ['ABC1745AABC', 'ABC1745AABC', 'ABC1745AAFF', 'ABC1745AAFE'],
 'Product ID': ['ABC-SUP2E-RF', 'ABC-SUP2E', 'ABC-SUP2E', 'ABC-SUP2E']}

Answer 1

You could add a column to mark rows ending with "RF", then sort values to leave those rows at the top of each group. And finally just group and take the first row:

df["RF"] = df["Product ID"].str.endswith("-RF")
df = df.sort_values(["Serial Number", "RF"], ascending=False)
output = df.groupby("Serial Number").first()[["Serial Number", "Product ID"]]

Output:

  Serial Number    Product ID
2  ABC1745AAFF      ABC-SUP2E
3  ABC1745AAFE      ABC-SUP2E
0  ABC1745AABC   ABC-SUP2E-RF

Answer 2

Thanks for you help. I've solved it like this:

df = df.sort_values(["Serial Number", "Product ID"], ascending=(True, True))

df = df.drop_duplicates(subset=['Serial Number', 'Product Group'], keep='first')

Fortunately, the product ID I kept was the longest string in all cases. It would be good to find a solution for scenarios where this was not always the case.

Pandas drop row based on groupby AND partial string match

Question

2 answers

solution1
1 2022-03-14 04:51:18

solution2
0 2022-03-14 23:39:13

Pandas drop row based on groupby AND partial string match

Question

2 answers

solution1 1 2022-03-14 04:51:18

solution2 0 2022-03-14 23:39:13

solution1
1 2022-03-14 04:51:18

solution2
0 2022-03-14 23:39:13