the data is something like this:
> head(r)
area peri shape perm
1 4990 2791.90 0.0903296 6.3
2 7002 3892.60 0.1486220 6.3
3 7558 3930.66 0.1833120 6.3
4 7352 3869.32 0.1170630 6.3
5 7943 3948.54 0.1224170 17.1
6 7979 4010.15 0.1670450 17.1
I want to perform multiple functions on each column, what I currently have is this function:
analysis = function(df){
measurements = data.frame(attributes = character(),
mean = double(),
median = double(),
variance = double(),
IQR = double())
for (i in 1:ncol(df)){
names = colnames(df)[i]
temp = data.frame(attribute = names,
mean = mean(df[,i]),
median = median(df[,i]),
variance = var(df[,i]),
IQR = IQR(df[,i]))
measurements = rbind(measurements, temp)
}
return (measurements)
}
It works well and achieve what I want which gives the following output:
attribute mean median variance IQR
1 area 7187.7291667 7487.000000 7.203045e+06 3564.2500000
2 peri 2682.2119375 2536.195000 2.049654e+06 2574.6150000
3 shape 0.2181104 0.198862 6.971657e-03 0.1004083
4 perm 415.4500000 130.500000 1.916848e+05 701.0500000
However, my supervisor said it is not efficient and not thinking in a R way. I also tried summarise_each()
and summarise_all(r, funs(mean, median, var, IQR))
but it doesn't achieve what I want and the output doesn't look nice.
What are some other ways to achieve that output only using base R or dplyr.
I suspect your supervisors comment about 'R'-style thinking was about using that for loop. Almost any for loop
you write can be replaced by the apply
family of functions (eg apply
, sapply
, lapply
etc).
They make it easier to run functions on vectors/data.frames/lists/etc.
Everything you could do using apply
functions could be replicated in for loops (usually with similar performance) so using for loops isn't actually a cardinal sin. Why use apply
functions? Well... once you learn them you get more succinct code which returns the results of running your functions on your data. Before long, you'll find this sort of code very intuitive, and even more readable than for loops.
df <- data.frame(
area = c(4990, 7002, 7558, 7352, 7943),
peri = c(2791.9, 3892.6, 3930.66, 3869.32, 3948.54),
shape = c(.0903296, .148622, .183312, .117063, .122417),
perm = c(6.3, 6.3, 6.3, 6.3, 17.1)
)
sapply(df, function(x) c(mean=mean(x), median=median(x), var=var(x), IQR=IQR(x)))
Your results can be achieved using base::Map
:
f <- function(x) {
desc = base::summary(x)
c(
Mean = unname(desc['Mean']),
Median = unname(desc['Median']),
Variance = base::sum((x-desc['Mean'])**2)/(length(x)-1),
IQR = unname(desc['3rd Qu.'] - desc['1st Qu.'])
)
}
t(as.data.frame(base::Map(f, df)))
# Mean Median Variance IQR
# area 7137.3333333 7455.0000000 1.241980e+06 757.25000000
# peri 3740.5283333 3911.6300000 2.183447e+05 68.93000000
# shape 0.1381314 0.1355195 1.192633e-03 0.04403775
# perm 9.9000000 6.3000000 3.110400e+01 8.10000000
Apologies
Data:
df <- data.frame(
area = c(4990, 7002, 7558, 7352, 7943, 7979),
peri = c(2791.9, 3892.6, 3930.66, 3869.32, 3948.54, 4010.15),
shape = c(.0903296, .148622, .183312, .117063, .122417, .167045),
perm = c(6.3, 6.3, 6.3, 6.3, 17.1, 17.1)
)
Hope that's useful.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.