[英]How to simultaneously read in excel sheets and mutate a new column with purrr/dplyr?
I'm trying to read in a time series dataset spread over multiple years (so the sheet names are the respective years). 我正在尝试读取分布在多年中的时间序列数据集(因此工作表名称是相应的年份)。
I want to read in each sheet and then mutate a new column called "year" that's equal to the sheet name. 我想阅读每张工作表,然后对与工作表名称相同的名为“ year”的新列进行变异。 I'm not sure how to do this all in one fell swoop. 我不确定如何一口气做到这一点。
All I have right now is this: 我现在所拥有的是:
map(excel_sheets(path), read_excel, path = path, skip = 1)
Here is one of the possible solution. 这是可能的解决方案之一。
Let say you have Excel file "ts.xlsx" with 3 sheets ("2016", "2017", "2018"). 假设您有3张Excel文件“ ts.xlsx”(“ 2016”,“ 2017”,“ 2018”)。
Each sheet has 3 values in "A" column: 每张纸在“ A”列中有3个值:
"2016" - (1, 2, 3); “ 2016”-(1、2、3);
"2017" - (4, 5, 6); “ 2017”-(4,5,6);
"2018" - (7, 8, 9). “ 2018”-(7,8,9)。
To read these data into one table with two columns ("data", "year") you can use the following R code: 要将这些数据读入一个具有两列(“ data”,“ year”)的表中,可以使用以下R代码:
# 1. Library
library(xlsx)
# 2. Excel file
excel_file <- "ts.xlsx"
# 3. Read Excel file
df <- loadWorkbook(excel_file)
# 4. Names and number of sheets
sheets_names <- names(getSheets(df))
sheets_count <- df$getNumberOfSheets()
# 5. Read Excel file by sheets
for(i in 1:sheets_count) {
# 5.1. Read one sheet
df_sheet_year <- read.xlsx(excel_file, i, header = F)
df_sheet_year$name <- sheets_names[i]
# 5.2. Merge with result dataset
if(i == 1) {
df_sheet <- df_sheet_year
} else {
df_sheet <- rbind(df_sheet, df_sheet_year)
}
}
# 6. Rename features
colnames(df_sheet) <- c("data", "year")
# 7. Check result dataset
df_sheet
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.