简体   繁体   中英

Pandas dataframe frequencies

I have this dataframe:

source target
0     ape    dog
1     ape   hous
2     dog   hous
3    hors    dog
4    hors    ape
5     dog    ape
6     ape   bird
7     ape   hous
8    bird   hous
9    bird   fist
10   bird    ape
11   fist    ape

I am trying to generate a frequency count with this code:

df_count =df.groupby(['source', 'target']).size().reset_index().sort_values(0, ascending=False)
df_count.columns = ['source', 'target', 'weight']

I get the result below.

source target  weight
2     ape   hous       2
0     ape   bird       1
1     ape    dog       1
3    bird    ape       1
4    bird   fist       1
5    bird   hous       1
6     dog    ape       1
7     dog   hous       1
8    fist    ape       1
9    hors    ape       1
10   hors    dog       1

How can I modify the code so that direction does not matter, ie that instead of ape bird 1 and bird ape 1 , i get ape bird 2 ?

First sort the values row-wise.

In [31]: df
Out[31]:
   source target
0     ape    dog
1     ape   hous
2     dog   hous
3    hors    dog
4    hors    ape
5     dog    ape
6     ape   bird
7     ape   hous
8    bird   hous
9    bird   fist
10   bird    ape
11   fist    ape

In [32]: df.values.sort()

In [33]: df
Out[33]:
   source target
0     ape    dog
1     ape   hous
2     dog   hous
3     dog   hors
4     ape   hors
5     ape    dog
6     ape   bird
7     ape   hous
8    bird   hous
9    bird   fist
10    ape   bird
11    ape   fist

Then, groupby on source, target , aggregate by size, sort the result.

In [34]: df.groupby(['source', 'target']).size().sort_values(ascending=False)
    ...:   .reset_index(name='weight')
Out[34]:
  source target  weight
0    ape   hous       2
1    ape    dog       2
2    ape   bird       2
3    dog   hous       1
4    dog   hors       1
5   bird   hous       1
6   bird   fist       1
7    ape   hors       1
8    ape   fist       1

You can first sort by rows by apply and then add parameter name to reset_index :

df_count = df.apply(sorted, axis=1) \
             .groupby(['source', 'target']) \
             .size() \
             .reset_index(name='weight') \
             .sort_values('weight', ascending=False)
print (df_count)
  source target  weight
0    ape   bird       2
1    ape    dog       2
4    ape   hous       2
2    ape   fist       1
3    ape   hors       1
5   bird   fist       1
6   bird   hous       1
7    dog   hors       1
8    dog   hous       1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM