I have a dataframe which looks like this:
id age1 sex1 age2 sex2 age3 sex3 age4 sex4
1 5 20 <NA> NA <NA> NA <NA> 27 Female
2 25 NA <NA> NA <NA> NA <NA> 35 Female
3 65 NA <NA> NA <NA> NA <NA> NA <NA>
this is the code for the data:
temp <- structure(list(id = c(5L, 25L, 65L, 25L, 65L, 5L, 5L, 85L, 285L,
541L), age1 = c(20L, NA, NA, NA, NA, NA, NA, NA, NA, NA), sex1 = structure(c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = c("missing",
"inapplicable", "refusal", "don't know", "inconsistent", "Male",
"Female"), class = "factor"), age2 = c(NA, NA, NA, NA, 31L,
NA, NA, NA, NA, NA), sex2 = structure(c(NA, NA, NA, NA, 7L,
NA, NA, NA, NA, NA), .Label = c("missing", "inapplicable", "refusal",
"don't know", "inconsistent", "Male", "Female"), class = "factor"),
age3 = c(NA, NA, NA, NA, 32L, NA, NA, NA, 25L, 23L), sex3 = structure(c(NA,
NA, NA, NA, 7L, NA, NA, NA, 6L, 7L), .Label = c("missing",
"inapplicable", "refusal", "don't know", "inconsistent",
"Male", "Female"), class = "factor"), age4 = c(27L, 35L,
NA, NA, 33L, NA, 24L, NA, 26L, NA), sex4 = structure(c(7L,
7L, NA, NA, 7L, NA, 7L, NA, 6L, NA), .Label = c("missing",
"inapplicable", "refusal", "don't know", "inconsistent",
"Male", "Female"), class = "factor")), row.names = c(NA,
10L), class = "data.frame")
I would like to know how to make multiple subsets based the data based on the columns.
I know I could do this by using the codes:
Subset1<- temp[,1:3]
Subset2<-temp[,c(1,4:5)]
Subset3<- temp[,c(1,6:7)]
But there must be a more concise way to do this. I've tried a for loop but I'm new to R and don't know how to this including keeping the names of the new subsets consistent.
We can use split.default
to split data based on number in the column names and append the first column in each list.
new_list <- lapply(split.default(temp[-1], gsub("\\D", "", names(temp)[-1])),
function(x) cbind(temp[1], x))
new_list
#$`1`
# id age_1 sex_1
#1 5 20 <NA>
#2 25 NA <NA>
#3 65 NA <NA>
#4 25 NA <NA>
#5 65 NA <NA>
#6 5 NA <NA>
#7 5 NA <NA>
#8 85 NA <NA>
#9 285 NA <NA>
#10 541 NA <NA>
#$`2`
# id age_2 sex_2
#1 5 NA <NA>
#...
This returns a list of dataframes, if you want data in separate dataframes, we can do :
names(new_list) <- paste0('Subset', seq_along(new_list))
list2env(new_list, .GlobalEnv)
Here is another base R solution
ind <- 1:4
list2env(setNames(lapply(ind, function(k) subset(temp,select = c(1,2*k+(0:1)))),
paste0("Subset",ind)),
envir = .GlobalEnv)
where subset
+ lapply
was used
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.