I'm using R for a project, but I am very new to R and not very familiar with it. I have a single dataset, and I want to split it and display separate summaries using the summarize function. I wrote some code using a for loop, but I understand that for loops are usually avoided in R, due to its functional nature.
Basically, I want to learn how to convert my code into a more functional approach using a map or perhaps a group_split function, or whatever else would work. I've tried a few things and haven't figured it out yet.
I've written an example of what I am trying to do by using a built-in R database:
library(tidyverse)
data(mtcars)
unique_gears <- unique(mtcars$gear)
for (g in unique_gears){
summ <- mtcars %>% filter(gear == g) %>% group_by(gear, cyl) %>%
summarize(min = min(mpg), max = max(mpg), mean=mean(mpg))
print(summ)
}
Using the mtcars database, effectively what that does is print 3 separate summary tables, split out by the number of gears in the car, with each table showing the number of cylinders in the car and the mpg.
I tried to look at ways to do that without using the For loop.
For example, I tried this:
mtcars %>% group_by(gear) %>% group_split() %>% group_by(cyl) %>% summarize(min = min(mpg))
I have the second group_by in there because I want the final summarize output to be grouped by another column (and I am using cyl for this example).
We don't need a loop here. Instead, it is a grouping by two columns
library(dplyr) # 1.0.0
mtcars %>%
group_by(gear, cyl) %>%
summarise(across(mpg, list(Min = min, Max = max, Mean = mean)))
# A tibble: 8 x 5
# Groups: gear [3]
# gear cyl mpg_Min mpg_Max mpg_Mean
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 3 4 21.5 21.5 21.5
#2 3 6 18.1 21.4 19.8
#3 3 8 10.4 19.2 15.0
#4 4 4 21.4 33.9 26.9
#5 4 6 17.8 21 19.8
#6 5 4 26 30.4 28.2
#7 5 6 19.7 19.7 19.7
#8 5 8 15 15.8 15.4
If we want a map
solution, after the group_split
on 'gear' (in the for
loop it is looping over the unique
values of 'gear' column), then map
over the list
and do a second grouping with cyl
before summarise
ing
library(purrr)
mtcars %>%
group_split(gear) %>%
map(~ .x %>%
group_by(cyl) %>%
summarize(min = min(mpg), max = max(mpg), mean=mean(mpg)))
In addition to @akrun's answer, another way to solve the problem if you are interested in rendering the summarized data as a report is to use tables::tabular()
.
library(tables)
tabular((Factor(gear) * Factor(cyl))~mpg*((n=1) + min + mean + max),data = mtcars)
...and the output:
mpg
gear cyl n min mean max
3 4 1 21.5 21.50 21.5
6 2 18.1 19.75 21.4
8 12 10.4 15.05 19.2
4 4 8 21.4 26.93 33.9
6 4 17.8 19.75 21.0
8 0 Inf NaN -Inf
5 4 2 26.0 28.20 30.4
6 1 19.7 19.70 19.7
8 2 15.0 15.40 15.8
NOTE: unlike the dplyr
solution, tabular()
creates rows for every combination of the factor variables regardless of whether they have any observations, so it reports data for the missing row of 4 gears / 8 cylinders.
The output object from tables::tabular()
can be printed in a high quality table with knitr::kable()
and enhanced with the features in the kableExtra
package.
If the desired outcome is simply to print descriptive statistics for a variable given a set of by group variables, we can also use psych::describeBy()
.
library(psych)
describeBy(mtcars$mpg,list(mtcars$gear,mtcars$cyl))
...and the first few rows of output:
> describeBy(mtcars$mpg,list(mtcars$gear,mtcars$cyl))
Descriptive statistics by group
: 3
: 4
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 1 21.5 NA 21.5 21.5 0 21.5 21.5 0 NA NA NA
-------------------------------------------------------------
: 4
: 4
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 8 26.92 4.81 25.85 26.92 5.56 21.4 33.9 12.5 0.25 -1.84 1.7
-------------------------------------------------------------
Bottom line: there are many ways to accomplish a task in R, and it's important to know how the results will be used in order to determine the "best" solution for a particular situation.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.