简体   繁体   English

根据最高编号将类分配给ID。 子类 - 熊猫

[英]Assigining class to an ID based on highest no. of subclasses - Pandas

So actually I have a dataframe with some videoID under which there is a chain of videos with subcategories and I want to assign the highest occurring class. 所以实际上我有一个带有一些videoID的数据videoID ,其中有一系列带有子类别的视频,我想分配最高的类。 So my dataframe looks like this, 所以我的数据框看起来像这样,

videoId   postId   class

12234     788         1
12234     789         1
12234     790         3
12234     791         4
12234     792         1
12234     793         4

So I want a dataframe like this for every such videoId: 所以我希望每个这样的videoId都有这样的数据帧:

videoId   class
  12234      1

Since highest occurring class is 1 (counting he subposts classes) under that videoId 由于在videoId下最高出现的类是1(计算他的子目录类)

Now suppose if I have a tie between the classes say like this: 现在假设我在这些类之间有一个联系如下:

videoId   postId   class

1620      34          1
1620      35          1
1620      36          2
1620      37          2

I want it to be like this: 我希望它是这样的:

 videoId  class
 1620      1
 1620      2

So when, there is a tie between the subclasses I want all of them to appear for that videoId . 所以,当子类之间存在联系时,我希望它们全部出现在该videoId I have tried several w ays, by doing value_counts() , max() , etc. but was not able to reach to the solution. 我通过执行value_counts()max()等尝试了几个问题但是无法达到解决方案。

You can simply apply mode over groupby and reset index ie 您可以简单地将mode应用于groupby并重置索引即

df.groupby('videoId')['class'].apply(pd.Series.mode).reset_index(level=0)

  videoId  class
0     1620      1
1     1620      2

One way to do this is to use dense ranking: 一种方法是使用密集排名:

df.groupby('videoId')['class'].value_counts()\
  .rank(method='dense',ascending=False)\
  .rename('ranking')\
  .reset_index()\
  .query('ranking == 1')

Output: 输出:

   videoId  class  ranking
0     1620      1      1.0
1     1620      2      1.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM