I want to basically build a distribution of total no. of videos a user has watched. Watch is signified by 1 else 0. Users are index of the data frame.
Assume the data is like this:
A B C
User1 1 1 0
User2 0 1 0
User3 1 0 1
I want for each use a count of all the 1 in that row.
I am doing something like this but it doesn't seem to work. I dont want to use some applymap function as that seem to be slow.
d.groupby(d.index).sum(axis=1)
Gives error that axis not recognized
If you have duplicates in index, you can use groupby
with double sum
:
print (df)
A B C
User1 1 1 0
User1 1 1 1
User2 0 1 0
User3 1 0 1
print (df.groupby(df.index).sum().sum(1))
User1 5
User2 1
User3 2
dtype: int64
If there are no duplicates, use only sum
- Psidom comment :
df.sum(axis=1)
EDIT:
import matplotlib.pyplot as plt
df.sum(axis=1).plot.hist()
plt.show()
Use the transpose method of the DataFrame.
In [38]: d = pd.DataFrame({'A':[1,0,1],'B':[1,1,0],'C':[0,0,1]},index=['User1','User2','User3'])
In [39]: d
Out[39]:
A B C
User1 1 1 0
User2 0 1 0
User3 1 0 1
In [40]: d.transpose()
Out[40]:
User1 User2 User3
A 1 0 1
B 1 1 0
C 0 0 1
In [41]: d.transpose().sum()
Out[41]:
User1 2
User2 1
User3 2
dtype: int64
Or, as Psidom suggested, sum the columns of your DataFrame.
In [55]: d.sum(axis=1)
Out[55]:
User1 2
User2 1
User3 2
dtype: int64
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.