简体   繁体   English

重新格式化R中的分类数据

[英]Reformat categorical data in R

I have a categorical dataset that I am trying to summarize that has inherent differences in the nature of questions that were asked. 我正在尝试归类一个分类数据集,该数据集在所问问题的性质方面具有内在差异。 The data below represent a questionnaire that had standard close-ended questions, but also questions where one could choose multiple answers from a list. 下面的数据代表了一个问卷调查表,其中包含标准的封闭式问题,也包含可以从列表中选择多个答案的问题。 "village" and "income" represent close-ended questions. “村庄”和“收入”代表了封闭的问题。 "responsible.1"...etc... represent a list where the respondent either said yes or no to each. “ responsible.1”等表示被访者对每个人说是或否的列表。

VILLAGE  INCOME         responsible.1   responsible.2   responsible.3   responsible.4   responsible.5
   j     both           DLNR             NA              DEQ              NA           Public
   k     regular.income DLNR             NA              NA               NA           NA
   k     regular.income DLNR             CRM             DEQ              Mayor        NA
   l     both           DLNR             NA              NA               Mayor        NA
   j     both           DLNR             CRM             NA               Mayor        NA
   m     regular.income DLNR             NA              NA               NA           Public

What I want is a 3-way table output with "village" and the suite of of "responsible" responsible variables wrapped up into a ftable . 我想要的是一个三向表输出,其中带有“ village”和一组“负责”的负责任变量,这些变量包装在一个ftable This way, I could use the table with numerous R packages for graphs and analyses. 这样,我可以将带有多个R包的表用于图形和分析。

        RESPONSIBLE             
VILLAGE INCOME          responsible.1   responsible.2   responsible.3   responsible.4   responsible.5
j       both            2               1               1               1               1
k       regular income  2               1               1               1               0
l       both            1               0               0               1               0
m       regular income  1               0               0               0               1

as.data.frame(table(village, responsible.1) would get me the first, but I can't figure out how to get the entire thing wrapped up in a nice ftable . as.data.frame(table(village, responsible.1)会让我第一个,但是我不知道如何将整个东西包装在一个好的ftable

> aggregate(dat[-(1:2)], dat[1:2], function(x) sum(!is.na(x)) )
  VILLAGE         INCOME responsible.1 responsible.2 responsible.3 responsible.4 responsible.5
1       j           both             2             1             1             1             1
2       l           both             1             0             0             1             0
3       k regular.income             2             1             1             1             0
4       m regular.income             1             0             0             0             1

I'm guessing you actually had another grouping vector , perhaps the first "responsible" column? 我猜您实际上还有另一个分组向量,也许是第一个“负责任”列?

I don't really understand the sorting rules but reversing the order of the grouping columns may be closer to what you posted: 我不太了解排序规则,但是颠倒分组列的顺序可能更接近于您发布的内容:

> aggregate(dat[-(1:2)], dat[2:1], function(x) sum(!is.na(x)) )
          INCOME VILLAGE responsible.1 responsible.2 responsible.3 responsible.4 responsible.5
1           both       j             2             1             1             1             1
2 regular.income       k             2             1             1             1             0
3           both       l             1             0             0             1             0
4 regular.income       m             1             0             0             0             1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM