简体   繁体   中英

How to create subsets of a dataframe based on columns using a for loop in R

I have a dataframe which looks like this:

   id age1 sex1 age2  sex2 age3  sex3 age4  sex4
1    5    20  <NA>    NA   <NA>    NA   <NA>    27 Female
2   25    NA  <NA>    NA   <NA>    NA   <NA>    35 Female
3   65    NA  <NA>    NA   <NA>    NA   <NA>    NA   <NA>

this is the code for the data:

temp <- structure(list(id = c(5L, 25L, 65L, 25L, 65L, 5L, 5L, 85L, 285L, 
541L), age1 = c(20L, NA, NA, NA, NA, NA, NA, NA, NA, NA), sex1 = structure(c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = c("missing", 
"inapplicable", "refusal", "don't know", "inconsistent", "Male", 
"Female"), class = "factor"), age2 = c(NA, NA, NA, NA, 31L, 
NA, NA, NA, NA, NA), sex2 = structure(c(NA, NA, NA, NA, 7L, 
NA, NA, NA, NA, NA), .Label = c("missing", "inapplicable", "refusal", 
"don't know", "inconsistent", "Male", "Female"), class = "factor"), 
    age3 = c(NA, NA, NA, NA, 32L, NA, NA, NA, 25L, 23L), sex3 = structure(c(NA, 
    NA, NA, NA, 7L, NA, NA, NA, 6L, 7L), .Label = c("missing", 
    "inapplicable", "refusal", "don't know", "inconsistent", 
    "Male", "Female"), class = "factor"), age4 = c(27L, 35L, 
    NA, NA, 33L, NA, 24L, NA, 26L, NA), sex4 = structure(c(7L, 
    7L, NA, NA, 7L, NA, 7L, NA, 6L, NA), .Label = c("missing", 
    "inapplicable", "refusal", "don't know", "inconsistent", 
    "Male", "Female"), class = "factor")), row.names = c(NA, 
10L), class = "data.frame")

I would like to know how to make multiple subsets based the data based on the columns.

I know I could do this by using the codes:

Subset1<- temp[,1:3]
Subset2<-temp[,c(1,4:5)]
Subset3<- temp[,c(1,6:7)]

But there must be a more concise way to do this. I've tried a for loop but I'm new to R and don't know how to this including keeping the names of the new subsets consistent.

We can use split.default to split data based on number in the column names and append the first column in each list.

new_list <- lapply(split.default(temp[-1], gsub("\\D", "", names(temp)[-1])), 
                   function(x) cbind(temp[1], x))
new_list

#$`1`
#    id age_1 sex_1
#1    5    20  <NA>
#2   25    NA  <NA>
#3   65    NA  <NA>
#4   25    NA  <NA>
#5   65    NA  <NA>
#6    5    NA  <NA>
#7    5    NA  <NA>
#8   85    NA  <NA>
#9  285    NA  <NA>
#10 541    NA  <NA>

#$`2`
#    id age_2  sex_2
#1    5    NA   <NA>
#...

This returns a list of dataframes, if you want data in separate dataframes, we can do :

names(new_list) <- paste0('Subset', seq_along(new_list))
list2env(new_list, .GlobalEnv)

Here is another base R solution

ind <- 1:4
list2env(setNames(lapply(ind, function(k) subset(temp,select = c(1,2*k+(0:1)))),
                  paste0("Subset",ind)),
         envir = .GlobalEnv)

where subset + lapply was used

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM