Summarizing a dataset with continuous and categorical variables

Question

If a dataset has mixed variables: numerical and categorical, is there a way to summarize it, in addition to summary(dataset), where the count of each category is included for categorical variables and the mean, sd is included for numerical variables?

Current I write a code snippet to generate a list after checking for each column being numerical or categorical. But a simpler function would be useful.

An example could be data.frame(v1 = c(1:3),v2= c("a","b","b")), where desired output is:

V1, type(num/cat), mean(v1), sd(v1) V2, type(num/cat), a, count(a), b, count(b)

Answer 1

I think you're looking for the function describe() in the package 'Hmisc'. See the documentation for details.

Answer 2

Yes, I was looking at table for categorical and mean + sd for numerical variables. For descriptive statistics in research papers, one commonly reports the following.

I wrote the following:

agg_function <- function(data_agg)
{
desc_list <- list()

    for(j in 1:ncol(data_agg))
    {
        if(is.factor(data_agg[,j]))
        {
          desc_list[[j]] <- list(Variable = colnames(data_agg) [j],table(data_agg[,j]))   ## Table of counts of labels of categorical variables
        }
        else  
        {
          desc_list[[j]] <- data.frame(Variable = colnames(data_agg)[j],Mean=mean(data_agg[,j],na.rm=T),SD = sd(data_agg[,j],na.rm=T)) ## First and second moments of numerical variables
        }
}
return(desc_list)
}

But is there a more efficient solution?

Summarizing a dataset with continuous and categorical variables

Question

2 answers

solution1
1 ACCPTED 2015-08-23 11:41:12

solution2
0 2015-08-24 02:00:44

Summarizing a dataset with continuous and categorical variables

Question

2 answers

solution1 1 ACCPTED 2015-08-23 11:41:12

solution2 0 2015-08-24 02:00:44

solution1
1 ACCPTED 2015-08-23 11:41:12

solution2
0 2015-08-24 02:00:44