[英]How can I aggregate data with categorical responses to get the percentage of each response type in R?
I want to get percentages of categorical answer types for different types of questions (TYPE). 我想获取不同类型问题(TYPE)的分类答案类型的百分比。 I have multiple responses for each type for each individual, with multiple, categorical responses (different levels). 对于每个人的每种类型,我都有多个响应,以及多个分类响应(不同级别)。
1) each individual should be on a different row, and 1)每个人都应位于不同的行,并且
2) the columns should be the TYPES+Response Level, with the value being percentage of times that particular response level was given for that question type for that individual. 2)列应为TYPES + Response Level,其值为该个人对该问题类型给出特定响应级别的次数的百分比。
The DATA looks like this: 数据如下所示:
SUBJECT TYPE RESPONSE
John a kappa
John b gamma
John a delta
John a gamma
Mary a kappa
Mary a delta
Mary b kappa
Mary a gamma
Bill b delta
Bill a gamma
The result should look like this: 结果应如下所示:
SUBJECT a-kappa a-gamma a-delta b-kappa b-gamma b-delta
John 0.33 0.33 0.33 1.00 1.00 0.00
Mary 0.66 0.33 0.00 1.00 0.00 0.00
Bill 1.00 0.00 0.00 0.00 0.00 1.00
Based on c1au61o_HH's answer I was able to create something that works for my actual data file, but will still need some post-processing. 根据c1au61o_HH的回答,我能够创建一些适用于实际数据文件的内容,但仍需要进行一些后期处理。 (It is also not very elegant, but that's a minor concern.) (它也不是很优雅,但这是一个小问题。)
Finaldf <- mydata %>%
group_by(Subject,Type) %>%
mutate(TOT = n()) %>%
group_by(Subject, Response, Type) %>%
mutate(RESPTOT = n())
Finaldf <- distinct(Finaldf)
Finaldf$Percentage <- Finaldf$RESPTOT/Finaldf$TOT
Any help is much appreciated, also please with some explanation. 任何帮助,不胜感激,也请一些解释。
Probably this is not the most efficient way, but if you want to use tidyverse
you can unite the 2 columns and then do 2 different group_by
to calculate totals for each subjects and percents. 可能这不是最有效的方法,但是如果您想使用tidyverse
,则可以将2列tidyverse
,然后进行2个不同的group_by
来计算每个主题和百分比的总计。
library(tidyverse)
df %>%
unite(TYPE_RESPONSE, c("TYPE", "RESPONSE"), sep = "_") %>%
group_by(SUBJECT) %>%
mutate(TOT = n()) %>%
group_by(SUBJECT, TYPE_RESPONSE) %>%
summarize(perc = n()/TOT * 100) %>%
spread(TYPE_RESPONSE, perc)
DATA: 数据:
df <- tibble( SUBJECT= rep(c("John", "Mary","Bill"), each = 4),
TYPE = rep(c("a","b"), 6),
RESPONSE = rep(c("kappa", "gamma", "delta"), 4)
)
EDIT in reply to comment: 编辑以回复评论:
I understand that you want to calculate the percentage by SUBJECT
and TYPE
, so the code would be something like this: 我了解您想通过SUBJECT
和TYPE
计算百分比,因此代码如下所示:
library(tidyverse)
df %>%
group_by(SUBJECT, TYPE) %>%
mutate(TOT = n()) %>%
unite(TYPE_RESPONSE, c("TYPE", "RESPONSE"), sep = "_") %>%
group_by(SUBJECT, TYPE_RESPONSE) %>%
summarize(perc = n()/TOT * 100)%>%
spread(TYPE_RESPONSE, perc)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.