简体   繁体   中英

Filter unique matches of multiple columns in dataframe with Pandas

I am fairly new to pandas and I've been trying multiple solutions for this problem using dataframe.merge and lambda logic but I haven't been able to find an solution that consistently results with what I'm looking for. After filtering some data using

df = df.groupby(['0', '1']).size()
df = df.to_frame(name='2').reset_index()

I obtain the following table, the first two columns represent starting and ending points respectively and the third represent the number of times it repeated before the groupby :

0   1   2
a   d   8
b   h   7
c   f   3
c   e   3
d   a   2
b   b   2
e   c   1
f   c   1
g   i   1
h   b   1
i   g   1

I need to consider both start -> end and end -> start points as the same, meaning that the following dataframe:

0   1   2
a   d   8
d   a   2

should end looking like this:

0   1   2
a   d   10

And back to the original table, that one should end looking like this:

0   1   2
a   d   10
b   h   8
c   f   4
c   e   4
b   b   2
g   i   2

I'm fairly sure this should be an easy solution but for the life of me I just can't pinpoint the answer.

You can do it like this:

df1 = df[['0', '1']].apply(sorted, 1, result_type = "expand").rename(columns = {0:'col1', 1:'col2'})
    
result = df.groupby([df1.col1, df1.col2]).sum().reset_index()

One option is to use apply to sort the values in the columns, then do another groupby (Note that your column names may differ, my df was made using pd.read_clipboard() )

df.reset_index(inplace=True)

df[['0','1']]=df[['0','1']].apply(lambda x:sorted(x),axis=1).tolist()

df

    0   1   2
0   a   d   8
1   b   h   7
2   c   f   3
3   c   e   3
4   a   d   2
5   b   b   2
6   c   e   1
7   c   f   1
8   g   i   1
9   b   h   1
10  g   i   1

df.groupby(['0','1'], as_index=False).sum()

    0   1   2
0   a   d   10
1   b   b   2
2   b   h   8
3   c   e   4
4   c   f   4
5   g   i   2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM