简体   繁体   中英

Grouping the values of all columns by index of a pandas dataframe

I want to basically build a distribution of total no. of videos a user has watched. Watch is signified by 1 else 0. Users are index of the data frame.

Assume the data is like this:

A   B   C
User1   1   1   0
User2   0   1   0
User3   1   0   1

I want for each use a count of all the 1 in that row.

I am doing something like this but it doesn't seem to work. I dont want to use some applymap function as that seem to be slow.


Gives error that axis not recognized

If you have duplicates in index, you can use groupby with double sum :

print (df)
       A  B  C
User1  1  1  0
User1  1  1  1
User2  0  1  0
User3  1  0  1

print (df.groupby(df.index).sum().sum(1))
User1    5
User2    1
User3    2
dtype: int64

If there are no duplicates, use only sum - Psidom comment :



import matplotlib.pyplot as plt



Use the transpose method of the DataFrame.

In [38]: d = pd.DataFrame({'A':[1,0,1],'B':[1,1,0],'C':[0,0,1]},index=['User1','User2','User3'])

In [39]: d
       A  B  C
User1  1  1  0
User2  0  1  0
User3  1  0  1

In [40]: d.transpose()
   User1  User2  User3
A      1      0      1
B      1      1      0
C      0      0      1

In [41]: d.transpose().sum()
User1    2
User2    1
User3    2
dtype: int64

Or, as Psidom suggested, sum the columns of your DataFrame.

In [55]: d.sum(axis=1)
User1    2
User2    1
User3    2
dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM