[英]Assigining class to an ID based on highest no. of subclasses - Pandas
So actually I have a dataframe with some videoID
under which there is a chain of videos with subcategories and I want to assign the highest occurring class. 所以实际上我有一个带有一些
videoID
的数据videoID
,其中有一系列带有子类别的视频,我想分配最高的类。 So my dataframe looks like this, 所以我的数据框看起来像这样,
videoId postId class
12234 788 1
12234 789 1
12234 790 3
12234 791 4
12234 792 1
12234 793 4
So I want a dataframe like this for every such videoId: 所以我希望每个这样的videoId都有这样的数据帧:
videoId class
12234 1
Since highest occurring class is 1 (counting he subposts classes) under that videoId
由于在
videoId
下最高出现的类是1(计算他的子目录类)
Now suppose if I have a tie between the classes say like this: 现在假设我在这些类之间有一个联系如下:
videoId postId class
1620 34 1
1620 35 1
1620 36 2
1620 37 2
I want it to be like this: 我希望它是这样的:
videoId class
1620 1
1620 2
So when, there is a tie between the subclasses I want all of them to appear for that videoId
. 所以,当子类之间存在联系时,我希望它们全部出现在该
videoId
。 I have tried several w ays, by doing value_counts()
, max()
, etc. but was not able to reach to the solution. 我通过执行
value_counts()
, max()
等尝试了几个问题但是无法达到解决方案。
You can simply apply mode
over groupby and reset index ie 您可以简单地将
mode
应用于groupby并重置索引即
df.groupby('videoId')['class'].apply(pd.Series.mode).reset_index(level=0)
videoId class
0 1620 1
1 1620 2
One way to do this is to use dense ranking: 一种方法是使用密集排名:
df.groupby('videoId')['class'].value_counts()\
.rank(method='dense',ascending=False)\
.rename('ranking')\
.reset_index()\
.query('ranking == 1')
Output: 输出:
videoId class ranking
0 1620 1 1.0
1 1620 2 1.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.