Filter unique matches of multiple columns in dataframe with Pandas

Question

I am fairly new to pandas and I've been trying multiple solutions for this problem using dataframe.merge and lambda logic but I haven't been able to find an solution that consistently results with what I'm looking for. After filtering some data using

df = df.groupby(['0', '1']).size()
df = df.to_frame(name='2').reset_index()

I obtain the following table, the first two columns represent starting and ending points respectively and the third represent the number of times it repeated before the groupby :

I need to consider both start -> end and end -> start points as the same, meaning that the following dataframe:

0   1   2
a   d   8
d   a   2

should end looking like this:

0   1   2
a   d   10

And back to the original table, that one should end looking like this:

I'm fairly sure this should be an easy solution but for the life of me I just can't pinpoint the answer.

Answer 1

You can do it like this:

df1 = df[['0', '1']].apply(sorted, 1, result_type = "expand").rename(columns = {0:'col1', 1:'col2'})
    
result = df.groupby([df1.col1, df1.col2]).sum().reset_index()

Answer 2

One option is to use apply to sort the values in the columns, then do another groupby (Note that your column names may differ, my df was made using pd.read_clipboard() )

df.reset_index(inplace=True)

df[['0','1']]=df[['0','1']].apply(lambda x:sorted(x),axis=1).tolist()

df

    0   1   2
0   a   d   8
1   b   h   7
2   c   f   3
3   c   e   3
4   a   d   2
5   b   b   2
6   c   e   1
7   c   f   1
8   g   i   1
9   b   h   1
10  g   i   1

df.groupby(['0','1'], as_index=False).sum()

    0   1   2
0   a   d   10
1   b   b   2
2   b   h   8
3   c   e   4
4   c   f   4
5   g   i   2

Filter unique matches of multiple columns in dataframe with Pandas

Question

2 answers

solution1
1 ACCPTED 2020-09-11 23:37:18

solution2
0 2020-09-11 22:47:24

Filter unique matches of multiple columns in dataframe with Pandas

Question

2 answers

solution1 1 ACCPTED 2020-09-11 23:37:18

solution2 0 2020-09-11 22:47:24

solution1
1 ACCPTED 2020-09-11 23:37:18

solution2
0 2020-09-11 22:47:24