[英]Grabbing data from columns with same name in r
I currently have 2 dataframes我目前有 2 个数据框
df1=data.frame(q1 = c(1:3),
q2 = c("One" , "Two" , "Three") ,
q3 = c(100,231,523),
q4 = c("red", "green", "blue"),
q1.2 = c(20:22),
q2.2 = c("Six" , "Ten" , "Twenty") ,
q3.2 = c(5,900,121),
q4.2 = c("purple", "yellow", "white"))
df2=data.frame(x1 = c("q1" , "q2.1" , "q3.2" , "q4.2") ,
x2 = c("q2" , "q3" , "q3.3" , "q4.4") ,
x3 = c("q3" , "q2.4" , "q3.3" , "q4.6"),
x4 = c("q4" , "q3.6" , "q3.3" , "q4.2"))
I need to create 4 different tables.我需要创建 4 个不同的表。 The headers of these tables are each of the rows included in df2 while the observations have to be obtained from df1.
这些表的标题是包含在 df2 中的每一行,而观察值必须从 df1 中获得。 As you noticed, some of the headers included in df2 do not exist in df1.
正如您所注意到的,df2 中包含的某些标头在 df1 中不存在。 I want my 4 tables to include all 4 headers (whether they exist or not) and if it doesnt, the its data should be blank.
我希望我的 4 个表包含所有 4 个标题(无论它们是否存在),如果不存在,则其数据应为空白。
I am currently using this code我目前正在使用此代码
for (i in 1:nrow(df2)) {
colnames(df2)<- df2[i,]
tabla_temp = df1[intersect(names(df1), names(df2))]
tname <- paste0("tabla_", i)
assign(tname, tabla_temp)
rm(tabla_temp)
}
I get my loop working but i get tables with different amounts of columns (only those which exist in df1).我让我的循环正常工作,但我得到了具有不同列数的表(只有那些存在于 df1 中的表)。
Any idea how i can get my loop to create same size tables with non existing headers to have blank obs instead?知道如何让我的循环创建具有不存在标题的相同大小的表来代替空白 obs 吗?
lapply(df2, function(x) {
merge(
df1[names(df1) %in% levels(x)],
read.table(text = "", col.names = levels(x)), all = T)
})
$x1
q1 q3.2 q4.2 q2.1
1 1 5 purple NA
2 2 900 yellow NA
3 3 121 white NA
$x2
q2 q3 q3.3 q4.4
1 One 100 NA NA
2 Three 523 NA NA
3 Two 231 NA NA
$x3
q3 q2.4 q3.3 q4.6
1 100 NA NA NA
2 231 NA NA NA
3 523 NA NA NA
$x4
q4 q4.2 q3.3 q3.6
1 blue white NA NA
2 green yellow NA NA
3 red purple NA NA
data数据
df1=data.frame(q1 = c(1:3),
q2 = c("One" , "Two" , "Three") ,
q3 = c(100,231,523),
q4 = c("red", "green", "blue"),
q1.2 = c(20:22),
q2.2 = c("Six" , "Ten" , "Twenty") ,
q3.2 = c(5,900,121),
q4.2 = c("purple", "yellow", "white"))
df2=data.frame(x1 = c("q1" , "q2.1" , "q3.2" , "q4.2") ,
x2 = c("q2" , "q3" , "q3.3" , "q4.4") ,
x3 = c("q3" , "q2.4" , "q3.3" , "q4.6"),
x4 = c("q4" , "q3.6" , "q3.3" , "q4.2"))
You can do it using any_of() function from dplyr.您可以使用 dplyr 中的any_of() function 来完成。 It selects the variables which match the names and ignores those which do not.
它选择与名称匹配的变量并忽略那些不匹配的变量。 I will use a list to store matrices from the loop.
我将使用一个列表来存储循环中的矩阵。 They can be accessed using
df_modified[[i]]
.可以使用
df_modified[[i]]
访问它们。
# Loading libraries
library(tidyverse)
df_modified = list()
for(i in 1:nrow(df2))
{
vars = as.character(df2[i,])
df_modified[[i]] = df1 %>%
select(any_of(vars))
}
Output Output
> df_modified
[[1]]
q1 q2 q3 q4
1 1 One 100 red
2 2 Two 231 green
3 3 Three 523 blue
[[2]]
q3
1 100
2 231
3 523
[[3]]
q3.2
1 5
2 900
3 121
[[4]]
q4.2
1 purple
2 yellow
3 white
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.