I want to use the function skim
from R package skimr
to produce summary statistics of multiple datasets. To save space, I need to prioritize information that gets displayed. I would like to remove these rows from the Data Summary section of skim
output: "Name", "Column type frequency", and "Group variables". Is there an easy way to do this?
I tried skim(iris) and got the following:
-- Data Summary ------------------------
Values
Name iris
Number of rows 150
Number of columns 5
_______________________
Column type frequency:
factor 1
numeric 4
________________________
Group variables None
-- Variable type: factor -----------------------------------------------------------------------------------------------------------------------------------------------------
# A tibble: 1 x 6
skim_variable n_missing complete_rate ordered n_unique top_counts
* <chr> <int> <dbl> <lgl> <int> <chr>
1 Species 0 1 FALSE 3 set: 50, ver: 50, vir: 50
-- Variable type: numeric ----------------------------------------------------------------------------------------------------------------------------------------------------
# A tibble: 4 x 11
skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
* <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 Sepal.Length 0 1 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ▆▇▇▅▂
2 Sepal.Width 0 1 3.06 0.436 2 2.8 3 3.3 4.4 ▁▆▇▂▁
3 Petal.Length 0 1 3.76 1.77 1 1.6 4.35 5.1 6.9 ▇▁▆▇▂
4 Petal.Width 0 1 1.20 0.762 0.1 0.3 1.3 1.8 2.5 ▇▁▇▅▃
Instead, I want to display the following:
-- Data Summary ------------------------
Values
Number of rows 150
Number of columns 5
-- Variable type: factor -----------------------------------------------------------------------------------------------------------------------------------------------------
# A tibble: 1 x 6
skim_variable n_missing complete_rate ordered n_unique top_counts
* <chr> <int> <dbl> <lgl> <int> <chr>
1 Species 0 1 FALSE 3 set: 50, ver: 50, vir: 50
-- Variable type: numeric ----------------------------------------------------------------------------------------------------------------------------------------------------
# A tibble: 4 x 11
skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
* <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 Sepal.Length 0 1 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ▆▇▇▅▂
2 Sepal.Width 0 1 3.06 0.436 2 2.8 3 3.3 4.4 ▁▆▇▂▁
3 Petal.Length 0 1 3.76 1.77 1 1.6 4.35 5.1 6.9 ▇▁▆▇▂
4 Petal.Width 0 1 1.20 0.762 0.1 0.3 1.3 1.8 2.5 ▇▁▇▅▃
Function skim
returns an object of S3 class "skim_df"
sub-classing classes "tbl_df"
, "tbl"
, "data.frame"
and a print
method for that class exists. This print
method has an argument include_summary
hat can be set to FALSE
to skip the printing of that information.
s <- skimr::skim(iris)
class(s)
#> [1] "skim_df" "tbl_df" "tbl" "data.frame"
Created on 2022-03-23 by the reprex package (v2.0.1)
To answer the question, just run
print(s, include_summary = FALSE)
#-- Variable type: factor ----------------------------------------------------------------------------------------------------------------
## A tibble: 1 x 6
# skim_variable n_missing complete_rate ordered n_unique top_counts
#* <chr> <int> <dbl> <lgl> <int> <chr>
#1 Species 0 1 FALSE 3 set: 50, ver: 50, vir: 50
#
#-- Variable type: numeric ----------------------------------------------------------------------#-----------------------------------------
# A tibble: 4 x 11
# skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
#* <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#1 Sepal.Length 0 1 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ▆▇▇▅▂
#2 Sepal.Width 0 1 3.06 0.436 2 2.8 3 3.3 4.4 ▁▆▇▂▁
#3 Petal.Length 0 1 3.76 1.77 1 1.6 4.35 5.1 6.9 ▇▁▆▇▂
#4 Petal.Width 0 1 1.20 0.762 0.1 0.3 1.3 1.8 2.5 ▇▁▇▅▃
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.