[英]Is there an R function to identify and modify column names across dataframes in the same List?
I am downloading an excel workbook with data across multiple years, separated by year into different sheets.我正在下载一个包含多年数据的 excel 工作簿,按年份分隔到不同的工作表中。 Each sheet has approximately 70 columns and col_names.每个工作表有大约 70 列和 col_names。
Unfortunately some columns have slightly different names for the same data.不幸的是,对于相同的数据,某些列的名称略有不同。
sheets <- excel_sheets(filename)
SheetList <- lapply(excel_sheets(filename), read_excel, path=filename)
names(SheetList)<- sheets
which loads a list of 13 elements (dataframes) separated by year.它加载了一个由年份分隔的 13 个元素(数据帧)的列表。 If I look at 2019 colnames I get如果我查看 2019 年的 colnames,我会得到
colnames(SheetList[[1]])
[1] "Number"
[2] "Year-Round Vacancy"
[3] "Premier Beds"
[4] "Total Year Round Beds"
.
and so on, versus 2013 colnames等等,与 2013 年的 colnames
colnamesnames(SheetList[[6]])
[1] "Number"
[2] "Year-Round Vacancy"
[3] "Premier Rooms"
[4] "Total Year Round Rooms"
...and so on ...等等
In these two cases, these columns are the same data labeled differently.在这两种情况下,这些列是标记不同的相同数据。
I understand I could use a str_replace_all for column names three and four but was curious if there is a more elegant way for identifying discrepancies and renaming columns (where applicable)我知道我可以将 str_replace_all 用于列名 3 和 4,但很好奇是否有更优雅的方法来识别差异和重命名列(如果适用)
Assuming the columns are arranged in the same order and represent the same data, then you can create a vector and assign that quickly to the dataframes:假设列以相同的顺序排列并表示相同的数据,那么您可以创建一个向量并将其快速分配给数据帧:
column_names <- c('Number', 'Vacancy', 'Premier', 'Total')
names(Sheet1) <- column_names
names(Sheet2) <- column_names
...
I understand for 70+ columns that would be a bit of an inconvenient vector, so not sure this helps.我知道 70 多列可能是一个不方便的向量,所以不确定这是否有帮助。
You might also just assign the names from one dataframe to another:您也可以将一个数据帧中的名称分配给另一个:
names(Sheet1) <- names(Sheet2)
This would sync them up.这将使它们同步。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.