简体   繁体   English

pandas 数据框中 2 列的值计数

[英]Value counts of 2 columns in a pandas dataframe

I have a table in the below format.我有以下格式的表格。 I would like to do the value counts of both the columns (year and operation) and get their percentage.我想做两列(年份和操作)的值计数并获得它们的百分比。 For example, in year "2014", value "yes" appears 2 out of 3 times, hence 2/3 = 0.66.例如,在“2014”年,值“是”出现了 3 次中的 2 次,因此 2/3 = 0.66。 I tried with value_counts but it did not yield the below results.我尝试使用 value_counts 但它没有产生以下结果。 Any leads would be appreciated.任何线索将不胜感激。

df[['year', 'operation']].apply(pd.Series.value_counts)

year operation
2014    yes
2014    yes
2014    no
2015    
2015    yes
2015    yes

Result:

2014   yes     0.66
2014   no      0.33
2015           0.33
2015   yes     0.66

Let's try with SeriesGroupBy.value_counts and set normalize=True to get the values as a percentage:让我们尝试使用SeriesGroupBy.value_counts并设置normalize=True以获取百分比值:

out = df.groupby('year')['operation'].value_counts(normalize=True)

out : out

year  operation
2014  yes          0.666667
      no           0.333333
2015  yes          0.666667
                   0.333333
Name: operation, dtype: float64

Can also set sort=False to not sort with highest value per level 0:还可以将sort=False设置为不以每级别 0 的最高值进行排序:

out = df.groupby('year')['operation'].value_counts(normalize=True, sort=False)

out : out

year  operation
2014  no           0.333333
      yes          0.666667
2015               0.333333
      yes          0.666667
Name: operation, dtype: float64

Series.reset_index can be used with name= set to create a DataFrame instead of a Series and give a name to the unnamed values column: Series.reset_index可以与name= set 一起使用来创建 DataFrame 而不是 Series 并为未命名的值列命名:

new_df = (
    df.groupby('year')['operation'].value_counts(normalize=True)
        .reset_index(name='freq')
)
   year operation      freq
0  2014       yes  0.666667
1  2014        no  0.333333
2  2015       yes  0.666667
3  2015            0.333333

DataFrame Used:使用的数据帧:

df = pd.DataFrame({'year': [2014, 2014, 2014, 2015, 2015, 2015],
                   'operation': ['yes', 'yes', 'no', '', 'yes', 'yes']})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM