简体   繁体   English

获取按另一列分组的多列的频率计数

[英]Getting frequency counts for multipe columns grouped BY another column

I am working on a questionnaire and the analysis will be based on the geographic region (a column in my data table).我正在编写一份问卷,分析将基于地理区域(我的数据表中的一列)。

In R , I am trying to figure out a way to get my entire questionnaire summarised by geographic region(KPG).R 中,我试图找出一种方法来按地理区域(KPG)汇总我的整个问卷。 So every geographic region as a row and each possible answer to a question (A001, A0002 etc.) as a column (including 0 values).因此,每个地理区域作为一行,问题的每个可能答案(A001、A0002 等)作为一列(包括 0 个值)。

table(dummyframe$KPG, dummyframe$A001)
      1 2 3 4 5
  111 0 1 1 0 0
  112 1 1 0 0 0
  113 4 0 1 0 0
  114 0 3 1 1 0
  115 0 0 1 2 1
  116 1 0 0 0 0
xtabs(~KPG+A001,dummyframe)
 A001
KPG   1 2 3 4 5
  111 0 1 1 0 0
  112 1 1 0 0 0
  113 4 0 1 0 0
  114 0 3 1 1 0
  115 0 0 1 2 1
  116 1 0 0 0 0

both ways return the frequency count in the desired format and returns a table-format for question1两种方式都以所需格式返回频率计数并返回问题 1 的表格格式

I expected to be able to do this for the many columns in my questionaire by adding like so:我希望能够通过添加如下内容来为我的问卷中的许多列执行此操作:

table(dummyframe$KPG, df$A001+A002)

but this results in evaluating the region against question one AND then question 2 as evaluated against question 1, whereas I want question 1 by region and question 2 by region, but the questions not evaluated against each other.但这会导致针对问题 1 评估区域,然后针对问题 1 评估问题 2,而我希望按区域评估问题 1,按区域评估问题 2,但这些问题并未相互评估。

I would like to apply the table function to each column of my data frame separately in one step and then bind the answers together so that my table is all answers by regions.我想在一个步骤中将表格函数分别应用于我的数据框的每一列,然后将答案绑定在一起,以便我的表格是按区域划分的所有答案。 I tried using aggregate我尝试使用聚合

aggregate(.~KPG, dummyframe, count)
KPG    A001       A002       A003       A004
1 111    2, 3       4, 5       2, 3       1, 3
2 112    1, 2       3, 5       3, 4       1, 2
3 113    1, 3 1, 2, 3, 4    1, 3, 4    1, 2, 4
4 114 2, 3, 4 1, 2, 3, 4    1, 3, 4 0, 1, 2, 4
5 115 3, 4, 5    2, 4, 5 0, 2, 3, 4       0, 3
6 116       1          1          2          1
 A005
1    0, 4
2       4
3 0, 2, 3
4    1, 4
5 0, 1, 4
6       2

and this results in each grid cell being filled with c(1,3,5) values when answers 1, 3 and 5 were given and is, as you can assume, very unhelpful.这导致在给出答案 1、3 和 5 时每个网格单元都填充了 c(1,3,5) 值,并且正如您可以假设的那样,这非常无用。

Any ideas for a loop?关于循环的任何想法? lapply?拉普莱? tapply?轻拍?

UPDATE: added data更新:添加数据

structure(list(KPG = c(111L, 111L, 112L, 112L, 113L, 113L, 113L, 
113L, 113L, 114L, 114L, 114L, 114L, 114L, 115L, 115L, 115L, 115L, 
116L), A001 = c(2L, 3L, 1L, 2L, 1L, 1L, 3L, 1L, 1L, 2L, 2L, 4L, 
2L, 3L, 3L, 4L, 5L, 4L, 1L), A002 = c(4L, 5L, 5L, 3L, 2L, 1L, 
3L, 4L, 2L, 3L, 2L, 4L, 4L, 1L, 4L, 5L, 5L, 2L, 1L), A003 = c(3L, 
2L, 3L, 4L, 3L, 4L, 1L, 4L, 4L, 4L, 1L, 3L, 3L, 4L, 2L, 4L, 0L, 
3L, 2L), A004 = c(1L, 3L, 1L, 2L, 2L, 1L, 1L, 1L, 4L, 4L, 2L, 
1L, NA, 0L, 3L, 0L, 3L, 0L, 1L), A005 = c(0L, 4L, 4L, 4L, 0L, 
0L, 3L, 3L, 2L, 1L, 1L, 4L, 1L, 4L, 4L, 0L, 1L, 1L, 2L)), .Names =      c("KPG", 
"A001", "A002", "A003", "A004", "A005"), row.names = c(NA, 19L
), class = "data.frame")

UPDATE: expected output更新:预期输出

    A001      A002      A003      A004      A005
    1 2 3 4 5 1 2 3 4 5 0 1 2 3 4 0 1 2 3 4 0 1 2
111 0 1 1 0 0 0 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 0
112 1 1 0 0 0 0 0 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 0
113 4 0 1 0 0 1 2 1 1 0 0 1 0 1 3 0 3 1 0 1 2 0 1
114 0 3 1 1 0 1 1 1 2 0 0 1 0 2 2 1 1 1 0 1 0 3 0
115 0 0 1 2 1 0 1 0 1 2 1 0 1 1 1 2 0 0 2 0 1 2 0
116 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1

do.call("cbind", lapply(names(dummyframe[-1]), function(x) { temp <- as.data.frame.matrix(table(dummyframe[["KPG"]], dummyframe[[x]])); setNames(temp, paste0(x, names(temp))) }))

--> as suggested gives you the expected output, but merges questions and answer numbers (which can be easily formatted in Excel) --> 按照建议为您提供预期的输出,但合并问题和答案编号(可以在 Excel 中轻松格式化)

We can extend what you were doing for one column to multiple columns by using lapply and then cbind the results together我们可以通过使用lapply将您对一列所做的工作扩展到多列,然后将结果cbind在一起

do.call("cbind", lapply(df[-1], function(x) table(df$KPG, x)))


#    1 2 3 4 5 1 2 3 4 5 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
#111 0 1 1 0 0 0 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 0 0 1
#112 1 1 0 0 0 0 0 1 0 1 0 0 0 1 1 0 1 1 0 0 0 0 0 0 2
#113 4 0 1 0 0 1 2 1 1 0 0 1 0 1 3 0 3 1 0 1 2 0 1 2 0
#114 0 3 1 1 0 1 1 1 2 0 0 1 0 2 2 1 1 1 0 1 0 3 0 0 2
#115 0 0 1 2 1 0 1 0 1 2 1 0 1 1 1 2 0 0 2 0 1 2 0 0 1
#116 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM