简体   繁体   English

函数使用规则返回最高计数值

[英]function to return the highest count value using a rule

I have two columns like shown below, and trying to return the highest count of the second column, but its just returning me the highest count on rating without considering the gender 我有两个列,如下所示,并试图返回第二列的最高计数,但它只是返回我最高的评级计数而不考虑性别

DATA : 数据:

print (df) 打印(df)

   AGE GENDER rating
0   10      M     PG
1   10      M      R
2   10      M      R
3    4      F   PG13
4    4      F   PG13

CODE : 代码:

 s = (df.groupby(['AGE', 'GENDER'])['rating']
       .apply(lambda x: x.value_counts().head(2))
       .rename_axis(('a','b', 'c'))
       .reset_index(level=2)['c'])

OUTPUT : 输出:

print (s[F])
('PG')

print(s[M]

('PG', 'R')

Here is a standard library solution for this file: 以下是此文件的标准库解决方案:

%%file "test.txt"
gender  rating
M   PG
M   R
F   NR
M   R
F   PG13
F   PG13

Given 特定

import collections as ct


def read_file(fname):
    with open(fname, "r") as f:
        header = next(f)
        for line in f:
            gender, rating = line.strip().split()
            yield gender, rating

Code

filename = "test.txt"

dd = ct.defaultdict(ct.Counter)
for k, v in sorted(read_file(filename), key=lambda x: x[0]):
    dd[k][v] += 1 

{k: v.most_common(1) for k, v in dd.items()}
# {'F': [('PG13', 2)], 'M': [('R', 2)]}

Details 细节

Each line of the file is parse and added to a defaultdict . 该文件的每一行都是解析并添加到defaultdict The keys are genders, but the values are Counter objects for each rating per gender. 键是性别,但每个性别的每个评级的值都是Counter对象。 Counter.most_common() is called to retrieve the top occurrences. 调用Counter.most_common()来检索Counter.most_common()出现的事件。

Since the data is grouped by gender, you can explore more information. 由于数据按性别分组,因此您可以浏览更多信息。 For example, unique ratings of each gender: 例如,每个性别的唯一评级:

{k: set(v.elements()) for k, v in dd.items()}
# {'F': {'NR', 'PG13'}, 'M': {'PG', 'R'}}

I think you need for counts with categories and ratings use groupby + value_counts + head : 我认为您需要使用groupby + value_counts + head来计算类别和评级:

df1 = (df.groupby('gender')['rating']
         .apply(lambda x: x.value_counts().head(1))
         .rename_axis(('gender','rating'))
         .reset_index(name='val'))
print (df1)
  gender rating  val
0      F   PG13    2
1      M      R    2

If want only top ratings seelct first value of index per group: 如果只想要最高评级seelct每组的索引第一个值:

s = df.groupby('gender')['rating'].apply(lambda x: x.value_counts().index[0])
print (s)
gender
F    PG13
M       R
Name: rating, dtype: object

print (s['M'])
R
print (s['F'])
PG13

Or only top counts select first value of Series per group: 或者只有最高计数选择每组Series第一个值:

s = df.groupby('gender')['rating'].apply(lambda x: x.value_counts().iat[0])
print (s)
gender
F    2
M    2
Name: rating, dtype: int64

print (s['M'])
2
print (s['F'])
2

EDIT: 编辑:

s = df.groupby('gender')['rating'].apply(lambda x: x.value_counts().index[0])

def gen_mpaa(gender):
    return s[gender]

print (gen_mpaa('M'))

print (gen_mpaa('F'))

EDIT: 编辑:

Solution if genre id values are strings: 解决方案,如果genre id值是字符串:

print (type(df.loc[0, 'genre id']))
<class 'str'>

df = df.set_index('gender')['genre id'].str.split(',', expand=True).stack()
print (df)
gender   
M       0    11
        1    22
        2    33
        0    22
        1    44
        2    55
        0    33
        1    44
        2    55
F       0    11
        1    22
        0    22
        1    55
        0    55
        1    44
dtype: object

d = df.groupby(level=0).apply(lambda x: x.value_counts().index[0]).to_dict()
print (d)
{'M': '55', 'F': '55'}

EDIT1: EDIT1:

print (df)
   AGE GENDER rating
0   10      M     PG
1   10      M      R
2   10      M      R
3    4      F   PG13
4    4      F   PG13

s = (df.groupby(['AGE', 'GENDER'])['rating']
       .apply(lambda x: x.value_counts().head(2))
       .rename_axis(('a','b', 'c'))
       .reset_index(level=2)['c'])
print (s)

a   b
4   F    PG13
10  M       R
    M      PG
Name: c, dtype: object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM