简体   繁体   English

在与purrr和readxl结合之前,在Excel工作表中标准化列名

[英]Standardize column names in excel sheets before combining with purrr and readxl

I would like to compile an Excel file with multiple tabs labeled by year (2016, 2015, 2014, etc). 我想编译一个Excel文件,其中包含按年份(2016、2015、2014等)标记的多个选项卡。 Each tab has identical data, but column names may be spelled differently from year-to-year. 每个选项卡具有相同的数据,但是列名的拼写可能与年份不同。

I would like to standardize columns in each sheet before combining. 我想在合并之前标准化每个工作表中的列。

This is the generic way of combining using purrr and readxl for such tasks: 这是将purrrreadxl结合用于此类任务的通用方法:

combined.df <- excel_sheets(my.file) %>% 
  set_names() %>%                                 
  map_dfr(read_excel, path = my.file, .id = "sheet") 

...however as noted, this creates separate columns for "COLUMN ONE", and "Column One", which have the same data. ...但是,如上所述,这会为“ COLUMN ONE”和“ Column One”分别创建具有相同数据的列。

Inserting make.names into the pipeline would probably be the best solution. make.names插入管道可能是最好的解决方案。

Keeping it all together would be ideal...something like: 将所有内容放在一起将是理想的……

   combined.df <- excel_sheets(my.file) %>% 
    set_names() %>% 
    map(read_excel, path = my.file) %>% 
    map(~(names(.) %>%  #<---WRONG
            make.names() %>% 
            str_to_upper() %>% 
            str_trim() %>% 
            set_names()) ) 

..but the syntax is all wrong. ..但是语法是错误的。

Rather than defining your own function, the clean_names function from the janitor package may be able to help you. 而不是定义您自己的函数, 看门人软件包中的clean_names函数可能会为您提供帮助。 It takes a dataframe/tibble as an input and returns a dataframe/tibble with clean names as an output. 它以数据框/小标题作为输入,并返回带有纯名称的数据框/小标题作为输出。

Here's an example: 这是一个例子:

library(tidyverse)

tibble(" a col name" = 1,
       "another-col-NAME" = 2,
       "yet another name  " = 3) %>% 
    janitor::clean_names()
#> # A tibble: 1 x 3
#>   a_col_name another_col_name yet_another_name
#>        <dbl>            <dbl>            <dbl>
#> 1          1                2                3

You can then plop it right into the code you gave: 然后,您可以将其直接放入您提供的代码中:

combined.df <- excel_sheets(my.file) %>% 
    set_names() %>%
    map(read_excel, path = my.file) %>%  #<Import as list, not dfr
    map(janitor::clean_names) %>%        #<janitor::clean_names
    bind_rows(.id = "sheet")

Creating a new function is doable but is verbose and uses two maps: 创建一个新函数是可行的,但很冗长,并且使用两个映射:

  # User defined function: col_rename
  col_rename <- function(df){
    names(df) <- names(df) %>% 
     str_to_upper() %>% 
     make.names() %>% 
     str_trim()
   return(df)
  }

   combined.df <- excel_sheets(my.file) %>% 
    set_names() %>%
    map(read_excel, path = my.file) %>%  #<Import as list, not dfr
    map(col_rename) %>%                  #<Fix colnames (user defined function)
    bind_rows(.id = "sheet")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM