简体   繁体   English

R多个类别变量的频率表

[英]R Frequency table of multiple categorical variable

I've imported interview data from a SPSS .SAV file as a data.frame and now I'm trying to create a frequency table based on the question number and interview location. 我已经从SPSS .SAV文件中将访谈数据作为data.frame ,现在我正尝试根据问题编号和访谈位置创建频率表。 Here's an example data.frame : 这是一个示例data.frame

loc<-c("city1","city2","city1","city2","city1","city1","city2","city2","city1","city2")
q1<-c("YES","YES","NO","MAYBE","NO","NO","YES","NO","MAYBE","MAYBE")
q2<-c("YES","NO","MAYBE","YES","NO","MAYBE","MAYBE","YES","YES","NO")
q3<-c("NO","NO","NO","NO","YES","YES","MAYBE","MAYBE","NO","MAYBE")
df<-data.frame(loc,q1,q2,q3)

df
     loc    q1    q2    q3
1  city1   YES   YES    NO
2  city2   YES    NO    NO
3  city1    NO MAYBE    NO
4  city2 MAYBE   YES    NO
5  city1    NO    NO   YES
6  city1    NO MAYBE   YES
7  city2   YES MAYBE MAYBE
8  city2    NO   YES MAYBE
9  city1 MAYBE   YES    NO
10 city2 MAYBE    NO MAYBE

Now I would like to count the number of occurances for each answer option "YES","NO","MAYBE" according to the question number "q1","q2","q3" and the location "city1","city" . 现在,我想根据问题编号"q1","q2","q3"和位置"city1","city"来计算每个答案选项"YES","NO","MAYBE"的出现次数"city1","city" The resulting data.frame should look like this: 产生的data.frame应该看起来像这样:

   loc quest  answ freq
1  city1    q1   YES    1
2  city1    q1    NO    3
3  city1    q1 MAYBE    1
4  city2    q1   YES    2
5  city2    q1    NO    1
6  city2    q1 MAYBE    2
7  city1    q2   YES    2
8  city1    q2    NO    1
9  city1    q2 MAYBE    2
10 city2    q2   YES    2
11 city2    q2    NO    2
12 city2    q2 MAYBE    1
13 city1    q3   YES    2
14 city1    q3    NO    3
15 city1    q3 MAYBE    0
16 city2    q3   YES    0
17 city2    q3    NO    2
18 city2    q3 MAYBE    3

So far I've played with count() , ddply() and summarise() from the plyr package with no luck. 到目前为止,我没有从plyr包中玩过count()ddply()summarise() My current solution is really hacky and involves splitting df by loc , creating a frequency table with as.data.frame(summary(df_city1)) , retrieving the frequency from the summary string and merging the summary data.frame s of city1 and city2 back together. 我当前的解决方案确实很棘手,涉及通过loc拆分df ,使用as.data.frame(summary(df_city1))创建一个频率表,从摘要字符串中检索频率以及将city1city2的摘要data.frame合并回去一起。 I guess there has to be an easier/more elegant solution. 我想必须有一个更轻松/更优雅的解决方案。

We convert the dataset from 'wide' to 'long' ( gather does that), then group_by ) 'loc','quest', 'answ', and use tally to get the count. 我们将数据集从“宽”转换为“长”( gather完成),然后将group_by )“ loc”,“ quest”,“ answ”,然后使用tally来获取计数。 But, if we are looking for combinations that are not found in the dataset to have a count of 0, then we may need to join with a dataset having all the unique combinations of three columns ( complete and unique does that). 但是,如果我们正在寻找在数据集中找不到的计数为0的组合,那么我们可能需要加入具有三列的所有unique组合的数据集( completeunique组合)。

library(dplyr)
library(tidyr)
dfN <- gather(df, quest, answ, q1:q3) %>%
                   complete(loc, quest, answ) %>%
                   unique()

res <- gather(df, quest, answ, q1:q3) %>%
               group_by(loc, quest, answ) %>%
               tally() %>%
               left_join(dfN, .) %>%
               mutate(n = ifelse(is.na(n), 0, n))
res
#     loc quest  answ     n
#   (fctr) (chr) (chr) (dbl)
#1   city1    q1 MAYBE     1
#2   city1    q1    NO     3
#3   city1    q1   YES     1
#4   city1    q2 MAYBE     2
#5   city1    q2    NO     1
#6   city1    q2   YES     2
#7   city1    q3 MAYBE     0
#8   city1    q3    NO     3
#9   city1    q3   YES     2
#10  city2    q1 MAYBE     2
#11  city2    q1    NO     1
#12  city2    q1   YES     2
#13  city2    q2 MAYBE     1
#14  city2    q2    NO     2
#15  city2    q2   YES     2
#16  city2    q3 MAYBE     3
#17  city2    q3    NO     2
#18  city2    q3   YES     0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM