简体   繁体   English

计算数据集每一列的比例(百分比)

[英]Calculate proportion (percent) for each column of a dataset

I'm trying to calculate the proportion (percent) of categories in each column of a dataset.我正在尝试计算数据集每一列中类别的比例(百分比)。

Example data:示例数据:

df <- data.frame(
    "Size" = c("Y","N","N","Y","Y"), 
    "Type" =  c("N","N","N","Y","N"), 
    "Age" = c("N","Y","N","Y","N"), 
    "Sex"=c("N","N","N","N","N")
  )

df

Data produces a table like this:数据生成如下表格:

    Size Type Age Sex
1    Y    N   N   N
2    N    N   Y   N
3    N    N   N   N
4    Y    Y   Y   N
5    Y    N   N   N

I've tried using prop.table to calculate proportion for one category:我尝试使用 prop.table 来计算一个类别的比例:

prop.table(table(df$Size))

This works, but only calculates the percent of Y or N answers for one column.这有效,但仅计算一列的 Y 或 N 答案的百分比。 This dataset is quite large, so I'd like to calculate the proportion for each category at once.这个数据集非常大,所以我想一次计算每个类别的比例。

My goal is to have a table that shows the proportion of "yes" answers for each column.我的目标是有一个表格,显示每列“是”答案的比例。

Like this:像这样:

       Proportion Y
Size    0.60
Type    0.20
Age     0.40
Sex     0.00

I am relatively new to R, so any help would be appreciated!我对 R 比较陌生,因此我们将不胜感激!

One way in base R would be to use apply column-wise on a logical vector基础 R 的一种方法是在逻辑向量上按列apply

apply(df == "Y", 2, mean)

#Size Type  Age  Sex 
# 0.6  0.2  0.4  0.0 

A simpler version with colSums . colSums的更简单版本。

colMeans(df == "Y")

A dplyr approach: dplyr 方法:

library(dplyr)
df %>% summarise_all(~mean(.=="Y"))

If you have more than one group:如果您有多个组:

df1 = data.frame(class="A",df)
df2 = data.frame(class="B",df)
#make df2 different
df2$Size<- rep("Y",5)
newdf = rbind(df1,df2)
newdf %>% group_by(class) %>% summarise_all(~mean(.=="Y"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM