[英]Need to subset the main dataframe using the column information on another file
請幫助我擁有要按列進行子集化的主數據集以及另一個文件中的列信息。 在本例中,我想從主文件創建 3 個數據框,所需的列位於 ColData (c(XX,CE.02), c(YY,CE.03,CE.01), c(ZZ, CE.05))。
XX <- c(1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0)
YY <- c(0,1,0,1,0,0,1,0,0,0,0,0,0,1,0,0)
ZZ <- c(1,0,1,1,0,0,0,1,0,1,0,0,1,0,1,1)
AL.01 <- c(NA,0,0,NA,NA,0,NA,0,0,3,0,0,0,3,0,0)
AL.02 <- c(NA,0,0,NA,NA,0,NA,0,0,4,0,0,0,2,0,0)
AL.03 <- c(NA,0,0,NA,NA,0,NA,0,0,3,0,0,0,3,0,0)
CE.01 <- c(NA,0,0,NA,NA,0,NA,0,0,3,0,0,0,3,0,0)
CE.02 <- c(NA,0,0,NA,NA,0,NA,0,0,3,0,0,0,2,0,0)
CE.03 <- c(NA,0,0,NA,NA,0,NA,0,0,3,0,0,0,2,0,0)
CE.04 <- c(NA,0,0,NA,NA,0,NA,0,0,3,0,0,0,1,0,0)
CE.05 <- c(NA,0,0,NA,NA,0,NA,0,0,3,0,0,0,1,0,0)
RCAQA <- c('XX','YY','ZZ')
QuestionID1 <- c('CE.02','CE.03','CE.05')
QuestionID2 <- c('','CE.01','')
MainData <- data.frame(XX,YY,ZZ,AL.01,AL.02,AL.03,CE.01,CE.02,CE.03,CE.04,CE.05)
ColData <- data.frame(RCAQA,QuestionID1,QuestionID2)
主數據
數據
所需的輸出數據幀 1 c(XX,CE.02)
所需的輸出數據幀 2 c(YY,CE.03,CE.01)
所需的輸出數據幀 3 c(ZZ,CE.05)
這就是我如何處理 df1、df2、df3:
df1 <- MainData %>% select(one_of(as.character(as.vector(ColData[1]))))
df2 <- MainData %>% select(one_of(as.character(as.vector(ColData[2]))))
df3 <- MainData %>% select(one_of(as.character(as.vector(ColData[3]))))
一個基本的R
解決方案
dfs <- apply(ColData, 1L, function(i, df) df[, i[i != ""]], MainData)
df1 <- dfs[[1L]]
df2 <- dfs[[2L]]
df3 <- dfs[[3L]]
我們可以使用asplit
按行拆分ColData
並使用lapply
從MainData
選擇列。 我們使用intersect
來獲取公共列。 這將為您提供數據框列表。
lapply(asplit(ColData, 1), function(x) MainData[intersect(names(MainData), x)])
#[[1]]
# XX CE.02
#1 1 NA
#2 0 0
#3 0 0
#4 1 NA
#5 0 NA
#...
#[[2]]
# YY CE.03 CE.01
#1 0 NA NA
#2 1 0 0
#3 0 0 0
#4 1 NA NA
#5 0 NA NA
#6 0 0 0
#7 1 NA NA
#...
#[[3]]
# ZZ CE.05
31 1 NA
#2 0 0
#3 1 0
#4 1 NA
#5 0 NA
#6 0 0
#...
使用dplyr
你可以這樣做:
library(dplyr)
ColData %>%
group_split(row_number(), .keep = FALSE) %>%
purrr::map(~MainData %>% select(any_of(unlist(.x))))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.