简体   繁体   中英

Create new dataframe reorganizing columns in R

I have looked at similar posts but haven't gotten anything to solve my issue.

I have a dataframe (df) with different sizes of two characteristic (S042: 4 sizes, S081: 3 sizes) as variables for 5 individuals (Sample):

  Sample S042_91 S042_93 S042_105 S042_107 S081_184 S081_187 S081_188
1   6001      91       0        0      107      184      187        0
2   6002       0      93      105        0        0        0      188
3   6005      91       0      105        0        0        0      188
4   6006       0       0        0      107        0      187      188
5   6008      91       0        0      107       NA       NA       NA

I would like to create a new dataframe reorganizing data per characteristic limiting to two columns (since each individual has one or two sizes, no more). If individual has one size, the second column must have a "0". If individual has no sizes (NA), both columns must have "NA". Also, rename characteristic column names to _1 and _2 keeping characteristic name.

New dataframe would end up looking like:

  Sample S042_1 S042_2 S081_1 S081_2
1   6001     91    107    184    187
2   6002     93    105    188      0
3   6005     91    105    188      0
4   6006    107      0    187    188
5   6008     91    107     NA     NA

Thanks a lot in advance.

Create a vector of unique column names that have different prefixes excluding the first column. This can be done using gsub by stripping out the characters starting from the _ . Then, loop this using lapply and grep this pattern to get the columns that have the same prefix. You can use apply to get the elements that are not 0 for each row followed by 0's.

size <- 1:2
nm1 <-  paste0("^", unique(gsub("\\_.*", "", colnames(df)[-1])))
dfN <- cbind(df[,1],do.call(data.frame,
         lapply(nm1, function(x) t(apply(df[grep(x, colnames(df))],1,
                function(x) c(x[x!=0], x[x==0])[size])))))

 colnames(dfN) <- c("Sample", paste(rep(c('SO42', 'SO81'), each=2), 1:2, sep="_"))


 dfN
 #  Sample SO42_1 SO42_2 SO81_1 SO81_2
 #1   6001     91    107   184    187
 #2   6002     93    105   188     0
 #3   6005     91    105   188     0
 #4   6006    107      0   187    188
 #5   6008     91    107    NA    NA

Another way using dplyr/tidyr

 library(dplyr)
 library(tidyr)
 df %>% 
    gather(Var, Val, -Sample) %>%
    separate(Var, c("Var1", "Var2")) %>%
    group_by(Sample, Var1) %>% 
    arrange(Val==0) %>%
    slice(1:2) %>%
    mutate(n=row_number()) %>%
    ungroup() %>%
    unite(Var, Var1, n) %>% 
    select(-Var2) %>% 
    spread(Var, Val)
 #    Sample S042_1 S042_2 S081_1 S081_2
 #1   6001     91    107    184    187
 #2   6002     93    105    188      0
 #3   6005     91    105    188      0
 #4   6006    107      0    187    188
 #5   6008     91    107     NA     NA
S4 <- apply(tmp, 1, function (x) sort(x[2:5], decreasing=TRUE)[1:2])
S8 <- apply(tmp, 1, function (x) sort(x[6:8], decreasing=TRUE)[1:2])
newdata <- cbind(t(S4), t(S8))

Untested but should work. Renaming the columns and making it a data frame is left for you to do.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM