简体   繁体   中英

Grabbing data from columns with same name in r

I currently have 2 dataframes

df1=data.frame(q1 = c(1:3),
               q2 = c("One" , "Two" , "Three") , 
               q3 = c(100,231,523),
               q4 = c("red", "green", "blue"),
               q1.2 = c(20:22),
               q2.2 = c("Six" , "Ten" , "Twenty") , 
               q3.2 = c(5,900,121),
               q4.2 = c("purple", "yellow", "white"))
df2=data.frame(x1 = c("q1" , "q2.1" , "q3.2" , "q4.2") ,
               x2 = c("q2" , "q3" , "q3.3" , "q4.4") ,
               x3 = c("q3" , "q2.4" , "q3.3" , "q4.6"), 
               x4 = c("q4" , "q3.6" , "q3.3" , "q4.2"))

I need to create 4 different tables. The headers of these tables are each of the rows included in df2 while the observations have to be obtained from df1. As you noticed, some of the headers included in df2 do not exist in df1. I want my 4 tables to include all 4 headers (whether they exist or not) and if it doesnt, the its data should be blank.

I am currently using this code

for (i in 1:nrow(df2)) {
  colnames(df2)<- df2[i,]
  tabla_temp = df1[intersect(names(df1), names(df2))]
  tname <- paste0("tabla_", i)
  assign(tname, tabla_temp)
  rm(tabla_temp)
}

I get my loop working but i get tables with different amounts of columns (only those which exist in df1).

Any idea how i can get my loop to create same size tables with non existing headers to have blank obs instead?

lapply(df2, function(x) {
  merge(
    df1[names(df1) %in% levels(x)],
    read.table(text = "", col.names = levels(x)), all = T)
})

$x1
  q1 q3.2   q4.2 q2.1
1  1    5 purple   NA
2  2  900 yellow   NA
3  3  121  white   NA

$x2
     q2  q3 q3.3 q4.4
1   One 100   NA   NA
2 Three 523   NA   NA
3   Two 231   NA   NA

$x3
   q3 q2.4 q3.3 q4.6
1 100   NA   NA   NA
2 231   NA   NA   NA
3 523   NA   NA   NA

$x4
     q4   q4.2 q3.3 q3.6
1  blue  white   NA   NA
2 green yellow   NA   NA
3   red purple   NA   NA

data

df1=data.frame(q1 = c(1:3),
               q2 = c("One" , "Two" , "Three") , 
               q3 = c(100,231,523),
               q4 = c("red", "green", "blue"),
               q1.2 = c(20:22),
               q2.2 = c("Six" , "Ten" , "Twenty") , 
               q3.2 = c(5,900,121),
               q4.2 = c("purple", "yellow", "white"))

df2=data.frame(x1 = c("q1" , "q2.1" , "q3.2" , "q4.2") ,
               x2 = c("q2" , "q3" , "q3.3" , "q4.4") ,
               x3 = c("q3" , "q2.4" , "q3.3" , "q4.6"), 
               x4 = c("q4" , "q3.6" , "q3.3" , "q4.2"))

You can do it using any_of() function from dplyr. It selects the variables which match the names and ignores those which do not. I will use a list to store matrices from the loop. They can be accessed using df_modified[[i]] .

# Loading libraries
library(tidyverse)

df_modified = list()
for(i in 1:nrow(df2))
{
   vars = as.character(df2[i,])
   df_modified[[i]] = df1 %>% 
      select(any_of(vars))
}

Output

> df_modified
[[1]]
  q1    q2  q3    q4
1  1   One 100   red
2  2   Two 231 green
3  3 Three 523  blue

[[2]]
   q3
1 100
2 231
3 523

[[3]]
  q3.2
1    5
2  900
3  121

[[4]]
    q4.2
1 purple
2 yellow
3  white

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM