Retain only duplicated rows in a pandas dataframe

Question

I have a dataframe with two columns: "Agent" and "Client" Each row corresponds to an interaction between an Agent and a client.

I want to keep only the rows if a client had interactions with at least 2 agents.

How can I do that?

Answer 1

Worth adding that now you can use df.duplicated()

df = df.loc[df.duplicated(subset='Agent', keep=False)]

Answer 2

Use groupby and transform by value_counts .

df[df.Agent.groupby(df.Agent).transform('value_counts') > 1]

Note, that, as mentioned here , you might have one agent interacting with the same client multiple times. This might be retained as a false positive. If you do not want this, you could add a drop_duplicates call before filtering:

df = df.drop_duplicates()
df = df[df.Agent.groupby(df.Agent).transform('value_counts') > 1]

print(df)
   A  B
0  1  2
1  2  5
2  3  1
3  4  1
4  5  5
5  6  1

mask = df.B.groupby(df.B).transform('value_counts') > 1
print(mask)
0    False
1     True
2     True
3     True
4     True
5     True
Name: B, dtype: bool

df = df[mask]
print(df)
   A  B
1  2  5
2  3  1
3  4  1
4  5  5
5  6  1

Retain only duplicated rows in a pandas dataframe

Question

2 answers

solution1
6 2022-03-09 16:05:37

solution2
1 ACCPTED 2017-09-17 12:05:57

Retain only duplicated rows in a pandas dataframe

Question

2 answers

solution1 6 2022-03-09 16:05:37

solution2 1 ACCPTED 2017-09-17 12:05:57

solution1
6 2022-03-09 16:05:37

solution2
1 ACCPTED 2017-09-17 12:05:57