简体   繁体   English

如何创建表格以显示R中所有虚拟变量的频率

[英]How to create a table shows frequency of all dummy variables in r

I am a rookie in R. I want to create a frequency table of all dummy variables and I have a data like this 我是R的新秀。我想创建所有伪变量的频率表,并且有一个类似这样的数据

ID Dummy_2008 Dummy_2009 Dummy_2010 Dummy_2011 Dummy_2012 Dummy_2013
1  1          1          0          0          1          1
2  0          0          1          1          0          1
3  0          0          1          0          0          1
4  0          1          1          0          0          1
5  0          0          0          0          1          0
6  0          0          0          1          0          0

I want to see how total frequency in each variable like this 我想看看像这样的每个变量的总频率

            0    1   sum
Dummy_2008  5    1   6
Dummy_2009  4    2   6
Dummy_2010  3    3   6
Dummy_2011  4    2   6
Dummy_2012  4    2   6
Dummy_2013  2    4   6

I only know to use table() , but I can only do this one variable a time. 我只知道使用table(),但是一次只能做一个变量。 I have many time serious dummy variables, and I want to see the trend of them. 我有很多时间很重要的虚拟变量,我想看看它们的趋势。

Many thanks for the help Terence 非常感谢Terence的帮助

result = as.data.frame(t(sapply(dat[,-1], table)))        
result$Sum = rowSums(result)

           0 1 Sum
Dummy_2008 5 1   6
Dummy_2009 4 2   6
Dummy_2010 3 3   6
Dummy_2011 4 2   6
Dummy_2012 4 2   6
Dummy_2013 2 4   6

Explanation: 说明:

sapply applies a function to each column of a data frame and returns a matrix. sapply将函数应用于数据帧的每一列并返回一个矩阵。 So sapply(dat[,-1], table) returns a matrix with the output of table for each column (except the first column, which we've excluded). 因此, sapply(dat[,-1], table)返回一个矩阵,其中每列(除了第一列,我们都排除在外)的table的输出。

The matrix needs to be transposed so that the column names from the original data frame are the rows and the dummy values are the columns, so we use the t (transpose) function for that. 需要对矩阵进行转置,以使原始数据帧中的列名称为行,而伪值为列,因此我们使用t (转置)函数。

We want a data frame, not a matrix, so we wrap the whole thing in as.data.frame . 我们需要一个数据框架,而不是一个矩阵,因此我们将整个包装在as.data.frame

Next, we want another column giving the total number of values, so we use the rowSums function. 接下来,我们需要另一列给出值的总数,因此我们使用rowSums函数。

Here is another option using mtabulate and addmargins 这是使用mtabulateaddmargins另一个选项

library(qdapTools)
addmargins(as.matrix(mtabulate(df1[-1])),2)
#           0 1 Sum
#Dummy_2008 5 1   6
#Dummy_2009 4 2   6
#Dummy_2010 3 3   6
#Dummy_2011 4 2   6
#Dummy_2012 4 2   6
#Dummy_2013 2 4   6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM