简体   繁体   English

从 r 中的同名列中获取数据

[英]Grabbing data from columns with same name in r

I currently have 2 dataframes我目前有 2 个数据框

df1=data.frame(q1 = c(1:3),
               q2 = c("One" , "Two" , "Three") , 
               q3 = c(100,231,523),
               q4 = c("red", "green", "blue"),
               q1.2 = c(20:22),
               q2.2 = c("Six" , "Ten" , "Twenty") , 
               q3.2 = c(5,900,121),
               q4.2 = c("purple", "yellow", "white"))
df2=data.frame(x1 = c("q1" , "q2.1" , "q3.2" , "q4.2") ,
               x2 = c("q2" , "q3" , "q3.3" , "q4.4") ,
               x3 = c("q3" , "q2.4" , "q3.3" , "q4.6"), 
               x4 = c("q4" , "q3.6" , "q3.3" , "q4.2"))

I need to create 4 different tables.我需要创建 4 个不同的表。 The headers of these tables are each of the rows included in df2 while the observations have to be obtained from df1.这些表的标题是包含在 df2 中的每一行,而观察值必须从 df1 中获得。 As you noticed, some of the headers included in df2 do not exist in df1.正如您所注意到的,df2 中包含的某些标头在 df1 中不存在。 I want my 4 tables to include all 4 headers (whether they exist or not) and if it doesnt, the its data should be blank.我希望我的 4 个表包含所有 4 个标题(无论它们是否存在),如果不存在,则其数据应为空白。

I am currently using this code我目前正在使用此代码

for (i in 1:nrow(df2)) {
  colnames(df2)<- df2[i,]
  tabla_temp = df1[intersect(names(df1), names(df2))]
  tname <- paste0("tabla_", i)
  assign(tname, tabla_temp)
  rm(tabla_temp)
}

I get my loop working but i get tables with different amounts of columns (only those which exist in df1).我让我的循环正常工作,但我得到了具有不同列数的表(只有那些存在于 df1 中的表)。

Any idea how i can get my loop to create same size tables with non existing headers to have blank obs instead?知道如何让我的循环创建具有不存在标题的相同大小的表来代替空白 obs 吗?

lapply(df2, function(x) {
  merge(
    df1[names(df1) %in% levels(x)],
    read.table(text = "", col.names = levels(x)), all = T)
})

$x1
  q1 q3.2   q4.2 q2.1
1  1    5 purple   NA
2  2  900 yellow   NA
3  3  121  white   NA

$x2
     q2  q3 q3.3 q4.4
1   One 100   NA   NA
2 Three 523   NA   NA
3   Two 231   NA   NA

$x3
   q3 q2.4 q3.3 q4.6
1 100   NA   NA   NA
2 231   NA   NA   NA
3 523   NA   NA   NA

$x4
     q4   q4.2 q3.3 q3.6
1  blue  white   NA   NA
2 green yellow   NA   NA
3   red purple   NA   NA

data数据

df1=data.frame(q1 = c(1:3),
               q2 = c("One" , "Two" , "Three") , 
               q3 = c(100,231,523),
               q4 = c("red", "green", "blue"),
               q1.2 = c(20:22),
               q2.2 = c("Six" , "Ten" , "Twenty") , 
               q3.2 = c(5,900,121),
               q4.2 = c("purple", "yellow", "white"))

df2=data.frame(x1 = c("q1" , "q2.1" , "q3.2" , "q4.2") ,
               x2 = c("q2" , "q3" , "q3.3" , "q4.4") ,
               x3 = c("q3" , "q2.4" , "q3.3" , "q4.6"), 
               x4 = c("q4" , "q3.6" , "q3.3" , "q4.2"))

You can do it using any_of() function from dplyr.您可以使用 dplyr 中的any_of() function 来完成。 It selects the variables which match the names and ignores those which do not.它选择与名称匹配的变量并忽略那些不匹配的变量。 I will use a list to store matrices from the loop.我将使用一个列表来存储循环中的矩阵。 They can be accessed using df_modified[[i]] .可以使用df_modified[[i]]访问它们。

# Loading libraries
library(tidyverse)

df_modified = list()
for(i in 1:nrow(df2))
{
   vars = as.character(df2[i,])
   df_modified[[i]] = df1 %>% 
      select(any_of(vars))
}

Output Output

> df_modified
[[1]]
  q1    q2  q3    q4
1  1   One 100   red
2  2   Two 231 green
3  3 Three 523  blue

[[2]]
   q3
1 100
2 231
3 523

[[3]]
  q3.2
1    5
2  900
3  121

[[4]]
    q4.2
1 purple
2 yellow
3  white

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM