简体   繁体   中英

sapply - retain column names

I am trying to summarise the mean, sd etc for a number of different columns (variables) in my dataset. I have coded my own summarise function to return exactly what I need and am using sapply to apply this function to all the variables at once. It works fine, however the dataframe that is returned has no column names and I cannot seem to even rename them using a column number reference - aka they seem impossible to use in any way.

My code is below- as I am just finding summary statistics, I would like to just keen the same column (variable) names, with 4 rows (mean, sd, min, max). Is there any way at all to do this (even a slow way where I manually change the names of the columns)

 #GENERATING DESCRIPTIVE STATISTICS
sfsum= function(x){
  mean=mean(x)
  sd=sd(x)
  min=min(x)
  max=max(x)

  return(c(mean,sd,min,max))
}

#
c= list(sfbalanced$age_child, sfbalanced$earnings_child, 
sfbalanced$logchildinc ,sfbalanced$p_inc84, sfbalanced$login84, 
sfbalanced$p_inc85, sfbalanced$login85, sfbalanced$p_inc86, 
sfbalanced$login86, sfbalanced$p_inc87, sfbalanced$login87, 
sfbalanced$p_inc88, sfbalanced$login88)

summ=sapply(c,sfsum)

names(summ)
 NULL

If you provide names in return during the function definition, you can have rownames as function names, if you provide names of lists while defining your object then you can use USE.NAMES in sapply to get the names automatically.

An example on mtcars data can give you following output.

Code

sfsum= function(x){
    mean=mean(x)
    sd=sd(x)
    min=min(x)
    max=max(x)

    return(c("mean"=mean,"sd"=sd,"min" = min,"max" =max)) #For rownames
}

#
x= list("mpg" = mtcars$mpg, "disp" = mtcars$disp, "drat" = mtcars$drat)
#For column names

summ=sapply(x,sfsum, USE.NAMES = TRUE) #USE.NAMES = TRUE to get names on top

Output :

> summ
           mpg     disp      drat
mean 20.090625 230.7219 3.5965625
sd    6.026948 123.9387 0.5346787
min  10.400000  71.1000 2.7600000
max  33.900000 472.0000 4.9300000

If we need to have the column names as well, just loop through the dataset (assuming that we are applying the function on all the columns)

out <- sapply(df2, sfsum)
row.names(out) <- c('mean', 'sd', 'min', 'max')

data

set.seed(24)
df2 <- as.data.frame(matrix(rnorm(4*4), 4, 4))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM