I am fairly new to pandas and I've been trying multiple solutions for this problem using dataframe.merge
and lambda
logic but I haven't been able to find an solution that consistently results with what I'm looking for. After filtering some data using
df = df.groupby(['0', '1']).size()
df = df.to_frame(name='2').reset_index()
I obtain the following table, the first two columns represent starting and ending points respectively and the third represent the number of times it repeated before the groupby
:
0 1 2
a d 8
b h 7
c f 3
c e 3
d a 2
b b 2
e c 1
f c 1
g i 1
h b 1
i g 1
I need to consider both start -> end and end -> start points as the same, meaning that the following dataframe:
0 1 2
a d 8
d a 2
should end looking like this:
0 1 2
a d 10
And back to the original table, that one should end looking like this:
0 1 2
a d 10
b h 8
c f 4
c e 4
b b 2
g i 2
I'm fairly sure this should be an easy solution but for the life of me I just can't pinpoint the answer.
You can do it like this:
df1 = df[['0', '1']].apply(sorted, 1, result_type = "expand").rename(columns = {0:'col1', 1:'col2'})
result = df.groupby([df1.col1, df1.col2]).sum().reset_index()
One option is to use apply
to sort the values in the columns, then do another groupby
(Note that your column names may differ, my df
was made using pd.read_clipboard()
)
df.reset_index(inplace=True)
df[['0','1']]=df[['0','1']].apply(lambda x:sorted(x),axis=1).tolist()
df
0 1 2
0 a d 8
1 b h 7
2 c f 3
3 c e 3
4 a d 2
5 b b 2
6 c e 1
7 c f 1
8 g i 1
9 b h 1
10 g i 1
df.groupby(['0','1'], as_index=False).sum()
0 1 2
0 a d 10
1 b b 2
2 b h 8
3 c e 4
4 c f 4
5 g i 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.