I have the following data frame:
data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})
product_id user_id
p1 a1
p1 a1
p2 a1
p1 a2
p1 a2
p1 a2
p2 a3
p2 a3
p3 a3
in real case there might be some other columns as well, but what i need to do is to group by data frame by product_id and user_id columns and count number of each combination and add it as a new column in a new dat frame
output should be something like this:
user_id product_id count
a1 p1 2
a1 p2 1
a2 p1 3
a3 p2 2
a3 p3 1
I have tried the following code:
grouped=data.groupby(['user_id','product_id']).count()
but the result is:
user_id product_id
a1 p1
p2
a2 p1
a3 p2
p3
actually the most important thing for me is to have a column names count that has the number of occurrences , i need to use the column later.
Maybe this is what you want?
>>> data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})
>>> count_series = data.groupby(['user_id', 'product_id']).size()
>>> count_series
user_id product_id
a1 p1 2
p2 1
a2 p1 3
a3 p2 2
p3 1
dtype: int64
>>> new_df = count_series.to_frame(name = 'size').reset_index()
>>> new_df
user_id product_id size
0 a1 p1 2
1 a1 p2 1
2 a2 p1 3
3 a3 p2 2
4 a3 p3 1
>>> new_df['size']
0 2
1 1
2 3
3 2
4 1
Name: size, dtype: int64
In Pandas 1.1.0 you can use the method value_counts
with DataFrames:
df.value_counts()
Output:
product_id user_id
p1 a2 3
p2 a3 2
p1 a1 2
p3 a3 1
p2 a1 1
If you need a DataFrame:
df.value_counts().to_frame('counts').reset_index()
Output:
product_id user_id counts
0 p1 a2 3
1 p2 a3 2
2 p1 a1 2
3 p3 a3 1
4 p2 a1 1
Building a little on @Nehal's response if you want to create another column with the results combined:
you can combine the product_id and size columns:
new_df['combo'] = new_df['product_id'].map(str) + '(' + new_df['size'].map(str) +')'
...and then use groupby again to organize that column by 'user_id'
new_df['combo'].astype(str).groupby(new_df['userid']).agg(['size',', '.join])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.