python-如何找到最大的熊貓群

Question

我有一個評分數據幀，其中包含userId, movieId, rating 。 我想找到評分最高的用戶。

這是我編寫的代碼：

import pandas as pd
ratings = pd.read_csv('ratings.csv') # userId,movieId,rating
user_rating_counts = ratings[['userId','movieId']].groupby('userId')['movieId'].agg(['count'])
top_rator = user_rating_counts[user_rating_counts['count']==user_rating_counts['count'].max()]

該文件的外觀如下：

userId,movieId,rating
1,1,4.0
1,3,4.0
1,6,4.0
1,47,5.0
1,50,5.0
1,70,3.0
1,101,5.0
1,110,4.0

當我在jupyter筆記本中查看top_rator ，它看起來像這樣：

       count
userId  
414     2698

我想從中得到一個元組，例如：

(414, 2698)

我怎樣才能做到這一點？

附言：我如何更好/更快/更短地完成此操作的任何評論將不勝感激。

Answer 1

你可以做：

sizes = df.groupby(['userId']).size()
(sizes.idxmax(), sizes.max())
#(1, 8)

詳細資料 ：

Groupby userId並獲取每個組的size 。

sizes = df.groupby(['userId']).size()
#userId
#1    8
#2    1

使用idxmax和max創建具有最高評分數的用戶元組：

(sizes.idxmax(), sizes.max())
#(1, 8)

Answer 2

如果只有一個與max匹配的用戶，則可以簡單地使用：

next(top_rator.max(1).items())

說明

top_rator.max(1)將返回：

userId
1    8
dtype: int64

Series.items()延遲迭代Series，在zip生成器對象中創建索引和值的tuple 。

next()用於訪問此生成器中的“下一個”（第一個） tuple組

如果有多個與最大值匹配的用戶，請改用列表推導：

[(idx, val) for idx, val in top_rator.max(1).items()]

Answer 3

將groupby與size一起使用，然后將Series.agg與max和idxmax Series.agg使用，在列表中：

tup = tuple(ratings.groupby('userId').size().agg(['idxmax','max']))
print (tup)
(1, 8)

說明：

每組的第一個匯總size ：

#changed data - multiple groups
print (df)
   userId  movieId  rating
0       1        1     4.0
1       1        3     4.0
2       1        6     4.0
3       2       47     5.0
4       2       50     5.0
5       2       70     3.0
6       2      101     5.0
7       3      110     4.0

print (df.groupby('userId').size())
userId
1    3
2    4
3    1
dtype: int64

輸出是Series ，因此添加Series.agg帶有索引列表的idxmax和max函數以及max的Series值的idxmax ：

print (df.groupby('userId').size().agg(['idxmax','max']))
idxmax    2
max       4
dtype: int64

最后轉換為tuple ：

print (tuple(df.groupby('userId').size().agg(['idxmax','max'])))
(2, 4)

如果多個組的最大大小相同，則解決方案：

print (ratings)   
   userId  movieId  rating
0       1        1     4.0
1       1        3     4.0
2       1        6     4.0
3       2       47     5.0
4       2       50     5.0
5       2       70     3.0
6       3      101     5.0
7       3      110     4.0

每組的第一個匯總size ，但有2個組的最大值為3 ：

user_rating_counts = ratings.groupby('userId')['movieId'].size()
print (user_rating_counts)
userId
1    3
2    3
3    2
Name: movieId, dtype: int64

因此，請首先使用boolean indexing ：

top_rator = (user_rating_counts[user_rating_counts == user_rating_counts.max()])
print (top_rator)
userId
1    3
2    3
Name: movieId, dtype: int64

創建DataFrame並轉換為元組列表：

tup = list(map(tuple, top_rator.reset_index().values.tolist()))
print (tup)
[(1, 3), (2, 3)]

python-如何找到最大的熊貓群

問題描述

3 個解決方案

解決方案1
2 2018-12-21 10:47:44

解決方案2
2 2018-12-21 11:10:28

說明

解決方案3
2 已采納 2018-12-21 11:11:14

python-如何找到最大的熊貓群

問題描述

3 個解決方案

解決方案1 2 2018-12-21 10:47:44

解決方案2 2 2018-12-21 11:10:28

說明

解決方案3 2 已采納 2018-12-21 11:11:14

解決方案1
2 2018-12-21 10:47:44

解決方案2
2 2018-12-21 11:10:28

解決方案3
2 已采納 2018-12-21 11:11:14