简体   繁体   English

是否有 R 函数来识别和修改同一列表中跨数据帧的列名?

[英]Is there an R function to identify and modify column names across dataframes in the same List?

I am downloading an excel workbook with data across multiple years, separated by year into different sheets.我正在下载一个包含多年数据的 excel 工作簿,按年份分隔到不同的工作表中。 Each sheet has approximately 70 columns and col_names.每个工作表有大约 70 列和 col_names。

Unfortunately some columns have slightly different names for the same data.不幸的是,对于相同的数据,某些列的名称略有不同。

sheets <- excel_sheets(filename)
SheetList <- lapply(excel_sheets(filename), read_excel, path=filename)
names(SheetList)<- sheets 

which loads a list of 13 elements (dataframes) separated by year.它加载了一个由年份分隔的 13 个元素(数据帧)的列表。 If I look at 2019 colnames I get如果我查看 2019 年的 colnames,我会得到

colnames(SheetList[[1]])

[1] "Number"                                               
 [2] "Year-Round Vacancy"                       
 [3] "Premier Beds"                
 [4] "Total Year Round Beds"
.

and so on, versus 2013 colnames等等,与 2013 年的 colnames

colnamesnames(SheetList[[6]]) 

[1] "Number"                                               
 [2] "Year-Round Vacancy"                       
 [3] "Premier Rooms"                
 [4] "Total Year Round Rooms"

...and so on ...等等

In these two cases, these columns are the same data labeled differently.在这两种情况下,这些列是标记不同的相同数据。

I understand I could use a str_replace_all for column names three and four but was curious if there is a more elegant way for identifying discrepancies and renaming columns (where applicable)我知道我可以将 str_replace_all 用于列名 3 和 4,但很好奇是否有更优雅的方法来识别差异和重命名列(如果适用)

Assuming the columns are arranged in the same order and represent the same data, then you can create a vector and assign that quickly to the dataframes:假设列以相同的顺序排列并表示相同的数据,那么您可以创建一个向量并将其快速分配给数据帧:

column_names <- c('Number', 'Vacancy', 'Premier', 'Total')

names(Sheet1) <- column_names
names(Sheet2) <- column_names 
...

I understand for 70+ columns that would be a bit of an inconvenient vector, so not sure this helps.我知道 70 多列可能是一个不方便的向量,所以不确定这是否有帮助。

You might also just assign the names from one dataframe to another:您也可以将一个数据帧中的名称分配给另一个:

names(Sheet1) <- names(Sheet2)

This would sync them up.这将使它们同步。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM