简体   繁体   English

如何在分类响应中汇总数据以获取R中每种响应类型的百分比?

[英]How can I aggregate data with categorical responses to get the percentage of each response type in R?

I want to get percentages of categorical answer types for different types of questions (TYPE). 我想获取不同类型问题(TYPE)的分类答案类型的百分比。 I have multiple responses for each type for each individual, with multiple, categorical responses (different levels). 对于每个人的每种类型,我都有多个响应,以及多个分类响应(不同级别)。

1) each individual should be on a different row, and 1)每个人都应位于不同的行,并且
2) the columns should be the TYPES+Response Level, with the value being percentage of times that particular response level was given for that question type for that individual. 2)列应为TYPES + Response Level,其值为该个人对该问题类型给出特定响应级别的次数的百分比。

The DATA looks like this: 数据如下所示:

SUBJECT TYPE    RESPONSE  
John    a   kappa                       
John    b   gamma  
John    a   delta  
John    a   gamma  
Mary    a   kappa   
Mary    a   delta       
Mary    b   kappa  
Mary    a   gamma  
Bill    b   delta  
Bill    a   gamma  

The result should look like this: 结果应如下所示:

SUBJECT a-kappa     a-gamma   a-delta   b-kappa     b-gamma b-delta
John    0.33        0.33      0.33      1.00        1.00    0.00
Mary    0.66        0.33      0.00      1.00        0.00    0.00
Bill    1.00        0.00      0.00      0.00        0.00    1.00

Based on c1au61o_HH's answer I was able to create something that works for my actual data file, but will still need some post-processing. 根据c1au61o_HH的回答,我能够创建一些适用于实际数据文件的内容,但仍需要进行一些后期处理。 (It is also not very elegant, but that's a minor concern.) (它也不是很优雅,但这是一个小问题。)

 Finaldf <- mydata %>%     
 group_by(Subject,Type) %>%     
 mutate(TOT = n()) %>%      
 group_by(Subject, Response, Type) %>%     
 mutate(RESPTOT = n())     

 Finaldf <- distinct(Finaldf)    
 Finaldf$Percentage <- Finaldf$RESPTOT/Finaldf$TOT    

Any help is much appreciated, also please with some explanation. 任何帮助,不胜感激,也请一些解释。

Probably this is not the most efficient way, but if you want to use tidyverse you can unite the 2 columns and then do 2 different group_by to calculate totals for each subjects and percents. 可能这不是最有效的方法,但是如果您想使用tidyverse ,则可以将2列tidyverse ,然后进行2个不同的group_by来计算每个主题和百分比的总计。

library(tidyverse)
df %>% 
  unite(TYPE_RESPONSE, c("TYPE", "RESPONSE"), sep = "_") %>% 
  group_by(SUBJECT) %>% 
  mutate(TOT = n()) %>% 
  group_by(SUBJECT, TYPE_RESPONSE) %>% 
  summarize(perc = n()/TOT * 100) %>% 
  spread(TYPE_RESPONSE, perc)

DATA: 数据:

df <- tibble( SUBJECT= rep(c("John", "Mary","Bill"), each = 4), 
                 TYPE = rep(c("a","b"), 6),
                 RESPONSE = rep(c("kappa", "gamma", "delta"), 4)
)

EDIT in reply to comment: 编辑以回复评论:

I understand that you want to calculate the percentage by SUBJECT and TYPE , so the code would be something like this: 我了解您想通过SUBJECTTYPE计算百分比,因此代码如下所示:

library(tidyverse)
df %>% 
  group_by(SUBJECT, TYPE) %>% 
  mutate(TOT = n()) %>%
  unite(TYPE_RESPONSE, c("TYPE", "RESPONSE"), sep = "_") %>% 
  group_by(SUBJECT, TYPE_RESPONSE) %>% 
  summarize(perc = n()/TOT * 100)%>% 
  spread(TYPE_RESPONSE, perc)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM