简体   繁体   中英

skimr: How to customize Data Summary in skim() output?

I want to use the function skim from R package skimr to produce summary statistics of multiple datasets. To save space, I need to prioritize information that gets displayed. I would like to remove these rows from the Data Summary section of skim output: "Name", "Column type frequency", and "Group variables". Is there an easy way to do this?

I tried skim(iris) and got the following:

-- Data Summary ------------------------
                           Values
Name                       iris  
Number of rows             150   
Number of columns          5     
_______________________          
Column type frequency:           
  factor                   1     
  numeric                  4     
________________________         
Group variables            None  

-- Variable type: factor -----------------------------------------------------------------------------------------------------------------------------------------------------
# A tibble: 1 x 6
  skim_variable n_missing complete_rate ordered n_unique top_counts               
* <chr>             <int>         <dbl> <lgl>      <int> <chr>                    
1 Species               0             1 FALSE          3 set: 50, ver: 50, vir: 50

-- Variable type: numeric ----------------------------------------------------------------------------------------------------------------------------------------------------
# A tibble: 4 x 11
  skim_variable n_missing complete_rate  mean    sd    p0   p25   p50   p75  p100 hist 
* <chr>             <int>         <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 Sepal.Length          0             1  5.84 0.828   4.3   5.1  5.8    6.4   7.9 ▆▇▇▅▂
2 Sepal.Width           0             1  3.06 0.436   2     2.8  3      3.3   4.4 ▁▆▇▂▁
3 Petal.Length          0             1  3.76 1.77    1     1.6  4.35   5.1   6.9 ▇▁▆▇▂
4 Petal.Width           0             1  1.20 0.762   0.1   0.3  1.3    1.8   2.5 ▇▁▇▅▃

Instead, I want to display the following:

-- Data Summary ------------------------
                           Values 
Number of rows             150   
Number of columns          5     

-- Variable type: factor -----------------------------------------------------------------------------------------------------------------------------------------------------
# A tibble: 1 x 6
  skim_variable n_missing complete_rate ordered n_unique top_counts               
* <chr>             <int>         <dbl> <lgl>      <int> <chr>                    
1 Species               0             1 FALSE          3 set: 50, ver: 50, vir: 50

-- Variable type: numeric ----------------------------------------------------------------------------------------------------------------------------------------------------
# A tibble: 4 x 11
  skim_variable n_missing complete_rate  mean    sd    p0   p25   p50   p75  p100 hist 
* <chr>             <int>         <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 Sepal.Length          0             1  5.84 0.828   4.3   5.1  5.8    6.4   7.9 ▆▇▇▅▂
2 Sepal.Width           0             1  3.06 0.436   2     2.8  3      3.3   4.4 ▁▆▇▂▁
3 Petal.Length          0             1  3.76 1.77    1     1.6  4.35   5.1   6.9 ▇▁▆▇▂
4 Petal.Width           0             1  1.20 0.762   0.1   0.3  1.3    1.8   2.5 ▇▁▇▅▃

Function skim returns an object of S3 class "skim_df" sub-classing classes "tbl_df" , "tbl" , "data.frame" and a print method for that class exists. This print method has an argument include_summary hat can be set to FALSE to skip the printing of that information.

s <- skimr::skim(iris)
class(s)
#> [1] "skim_df"    "tbl_df"     "tbl"        "data.frame"

Created on 2022-03-23 by the reprex package (v2.0.1)

To answer the question, just run

print(s, include_summary = FALSE)
#-- Variable type: factor ----------------------------------------------------------------------------------------------------------------
## A tibble: 1 x 6
#  skim_variable n_missing complete_rate ordered n_unique top_counts               
#* <chr>             <int>         <dbl> <lgl>      <int> <chr>                    
#1 Species               0             1 FALSE          3 set: 50, ver: 50, vir: 50
#
#-- Variable type: numeric ----------------------------------------------------------------------#-----------------------------------------
# A tibble: 4 x 11
#  skim_variable n_missing complete_rate  mean    sd    p0   p25   p50   p75  p100 hist 
#* <chr>             <int>         <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#1 Sepal.Length          0             1  5.84 0.828   4.3   5.1  5.8    6.4   7.9 ▆▇▇▅▂
#2 Sepal.Width           0             1  3.06 0.436   2     2.8  3      3.3   4.4 ▁▆▇▂▁
#3 Petal.Length          0             1  3.76 1.77    1     1.6  4.35   5.1   6.9 ▇▁▆▇▂
#4 Petal.Width           0             1  1.20 0.762   0.1   0.3  1.3    1.8   2.5 ▇▁▇▅▃

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM