I currently have 2 dataframes
df1=data.frame(q1 = c(1:3),
q2 = c("One" , "Two" , "Three") ,
q3 = c(100,231,523),
q4 = c("red", "green", "blue"),
q1.2 = c(20:22),
q2.2 = c("Six" , "Ten" , "Twenty") ,
q3.2 = c(5,900,121),
q4.2 = c("purple", "yellow", "white"))
df2=data.frame(x1 = c("q1" , "q2.1" , "q3.2" , "q4.2") ,
x2 = c("q2" , "q3" , "q3.3" , "q4.4") ,
x3 = c("q3" , "q2.4" , "q3.3" , "q4.6"),
x4 = c("q4" , "q3.6" , "q3.3" , "q4.2"))
I need to create 4 different tables. The headers of these tables are each of the rows included in df2 while the observations have to be obtained from df1. As you noticed, some of the headers included in df2 do not exist in df1. I want my 4 tables to include all 4 headers (whether they exist or not) and if it doesnt, the its data should be blank.
I am currently using this code
for (i in 1:nrow(df2)) {
colnames(df2)<- df2[i,]
tabla_temp = df1[intersect(names(df1), names(df2))]
tname <- paste0("tabla_", i)
assign(tname, tabla_temp)
rm(tabla_temp)
}
I get my loop working but i get tables with different amounts of columns (only those which exist in df1).
Any idea how i can get my loop to create same size tables with non existing headers to have blank obs instead?
lapply(df2, function(x) {
merge(
df1[names(df1) %in% levels(x)],
read.table(text = "", col.names = levels(x)), all = T)
})
$x1
q1 q3.2 q4.2 q2.1
1 1 5 purple NA
2 2 900 yellow NA
3 3 121 white NA
$x2
q2 q3 q3.3 q4.4
1 One 100 NA NA
2 Three 523 NA NA
3 Two 231 NA NA
$x3
q3 q2.4 q3.3 q4.6
1 100 NA NA NA
2 231 NA NA NA
3 523 NA NA NA
$x4
q4 q4.2 q3.3 q3.6
1 blue white NA NA
2 green yellow NA NA
3 red purple NA NA
data
df1=data.frame(q1 = c(1:3),
q2 = c("One" , "Two" , "Three") ,
q3 = c(100,231,523),
q4 = c("red", "green", "blue"),
q1.2 = c(20:22),
q2.2 = c("Six" , "Ten" , "Twenty") ,
q3.2 = c(5,900,121),
q4.2 = c("purple", "yellow", "white"))
df2=data.frame(x1 = c("q1" , "q2.1" , "q3.2" , "q4.2") ,
x2 = c("q2" , "q3" , "q3.3" , "q4.4") ,
x3 = c("q3" , "q2.4" , "q3.3" , "q4.6"),
x4 = c("q4" , "q3.6" , "q3.3" , "q4.2"))
You can do it using any_of() function from dplyr. It selects the variables which match the names and ignores those which do not. I will use a list to store matrices from the loop. They can be accessed using df_modified[[i]]
.
# Loading libraries
library(tidyverse)
df_modified = list()
for(i in 1:nrow(df2))
{
vars = as.character(df2[i,])
df_modified[[i]] = df1 %>%
select(any_of(vars))
}
Output
> df_modified
[[1]]
q1 q2 q3 q4
1 1 One 100 red
2 2 Two 231 green
3 3 Three 523 blue
[[2]]
q3
1 100
2 231
3 523
[[3]]
q3.2
1 5
2 900
3 121
[[4]]
q4.2
1 purple
2 yellow
3 white
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.