比較excel文件中多個工作表的列標題並將其提取到R

Question

所以我有一個 excel 文件，其中有幾張表中的數據，我必須合並這些數據，以便我可以從中提供見解：

這些床單以從 11 月 .....十月開始的每個月命名（總共：12 張）

我的代碼是這樣開始的：

#List of months to look at
months = c("Novemeber", "December", "January", "February", "March", "April", "May", "June", "July", "August", "September")

我想要做的是將每個工作表中的列名與一個空的 df（我稱之為差異）相匹配，並相應地將數據提取到這些列。 我的代碼是這樣的

discrepancies <-
  setNames(
    data.frame(matrix(ncol = 12, nrow = 0)),
    c(
      "Date",
      "Officer",
      "Case Number",
      "Account Number",
      "Plan Type",
      "Type",
      "ID",
      "Transaction Amount",
      "Code",
      "Specialist",
      "Transit#",
      "Processed Via"
      )
  )
#Query for each month's data and append to the main dataframe
for (i in months) {
  temp <- read_excel(
    "G:/Confidental.xlsx",
    sheet = i,
    col_names = TRUE,
    skip = 0
  )
  temp$`months` <- i
  discrepancies <- rbind(discrepancies, temp)
}

與僅我想要的列相比，此代碼將工作表中的每個字段都記錄下來，並且當一張工作表的列數與差異 df 中的列數不同時，它會卡住。 任何幫助表示贊賞。

Answer 1

一個可能的解決方案是沿着這個例子的路線：

# verification data.frame
descrepancies <- data.frame(Col1=character(),
                            Col2=character(),
                            Col3=character())
# test 1: one column missing
df1 <- data.frame(Col1= c(1,1),
                  Col3= c(1,1))
# test 2: one column that is not in discrepancies
df2 <- data.frame(Col1= c(2,2),
                  Col4= c(2,2))
# text 3: all columns are matching
df3 <- data.frame(Col1= c(3,3),
                  Col2= c(3,3),
                  Col3= c(3,3))

我使用的步驟是從測試 data.frame 中獲取列名，為不在測試 data.frame 中但存在差異的列創建新列，從測試 data.frame 中選擇所有存在差異的列。 我只運行了 3 次來檢查所有情況並安裝一個最終的 df 以證明它正在工作

# get column names from descrepancies to check the tests
nd <- colnames(descrepancies)

# run procedure on test 1
nf1 <- colnames(df1)
df1[, nd[!nd %in% nf1]] <- NA
descrepancies <- rbind(descrepancies, df1[, nd])

# run procedure on test 2
nf2 <- colnames(df2)
df2[, nd[!nd %in% nf2]] <- NA
descrepancies <- rbind(descrepancies, df2[, nd])

# run procedure on test 3
nf3 <- colnames(df3)
df3[, nd[!nd %in% nf3]] <- NA
descrepancies <- rbind(descrepancies, df3[, nd])

# print the final df
descrepancies

  Col1 Col2 Col3
1    1   NA    1
2    1   NA    1
3    2   NA   NA
4    2   NA   NA
5    3    3    3
6    3    3    3

Answer 2

我認為您不需要創建一個空的數據框來比較所有列。 試試這個方法：

library(readxl)
result <- purrr::map_df(months, ~read_excel("G:/Confidental.xlsx",sheet = .x), 
                       .id = 'months')

這將合並一個數據幀上的所有工作表。 如果工作表中缺少某些列，這將自動為該月的那些列插入NA 。

比較excel文件中多個工作表的列標題並將其提取到R

問題描述

2 個解決方案

解決方案1
1 2020-11-05 00:02:40

解決方案2
1 已采納 2020-11-05 03:23:36

比較excel文件中多個工作表的列標題並將其提取到R

問題描述

2 個解決方案

解決方案1 1 2020-11-05 00:02:40

解決方案2 1 已采納 2020-11-05 03:23:36

解決方案1
1 2020-11-05 00:02:40

解決方案2
1 已采納 2020-11-05 03:23:36