简体   繁体   中英

Merge several columns into one with specific conditions in R

I have data of the form :

Id1      Id21    c1      Id22    c2      Id23     c3      Id24       c4    
1         20      5        11     9        9      20       32        10
1         40      4        14     9        13      5       36         9
1         43      3        15     3        23      1       39         8     
2         47      5        17     8        11      9       10         5
2         5       4        12     8        14      8       28         4      
2         6       0        10     2        24      4       23         2
3         .       .         .     .         .      .       .          .
3         .       .         .     .         .      .       .          .
3          
4
.
.
100
100
100

Id1 with three entries for each Id has corresponding Id2i and ci, i -> [1,4] such that id2i is always in increasing order and ci is always in decreasing order for each id1. I need the output to be :

Id1    Id2     c
1       9      20
1       32     10  
1       11     9
1       14     9
1       36     9
2       11     9
2       17     8
2       12     8
2       14     8
2       47     5
.
.
.
100
100
100
100
100     .      .

so that for five entries of each id in id1, top 5 c's are chosen from all the ci, such that c (output) is max group of all the ci. How this can be achieved in R ?

Using dev version of data.table :

# using first six rows from your post
require(data.table) # v1.9.5+
ans <- melt(setDT(df), measure = patterns(c("^Id2", "^c[0-9]$"))
         value.name = c("Id2", "c"))
ans[order(-c), head(.SD, 5L), by=Id1, .SDcols = -(variable)]
#     Id1 Id2  c
#  1:   1   9 20
#  2:   1  32 10
#  3:   1  11  9
#  4:   1  14  9
#  5:   1  36  9
#  6:   2  11  9
#  7:   2  17  8
#  8:   2  12  8
#  9:   2  14  8
# 10:   2  47  5

Basically, melt can accept a list of column names to group columns from each element of the list into separate columns. Have a look at the result of lapply(...) to see which columns are combined together.

Then we group by Id1 after ordering by c column in decreasing order and pick the first 5 rows from subset of data belonging to each group.

You can use gather from tidyr and starts_with from dplyr to do this.

require(tidyr)
require(dplyr)

df %>% 
  gather(key = "Id2_key", value = "Id2", starts_with("Id2")) %>%
  gather(key = "c_key", value = "c", starts_with("c"))
##    Id1 Id2_key Id2 c_key  c
## 1    1    Id21  20    c1  5
## 2    1    Id21  40    c1  4
## 3    1    Id21  43    c1  3
## 4    2    Id21  47    c1  5
## 5    2    Id21   5    c1  4
## 6    2    Id21   6    c1  0
## ...                     ...
#Try this: (df is your original dataframe)
     library(reshape2)
        df1<melt(df,measure.vars=paste0("c",1:4),variable.name="c",value.name="c_value")
        df2<-melt(df1,measure.vars=paste0("Id2",1:4),variable.name="Id2",value.name="Id2_value")
head(df2)

  Id1  c c_value  Id2 Id2_value
1   1 c1       5 Id21        20
2   1 c1       4 Id21        40
3   1 c1       3 Id21        43
4   2 c1       5 Id21        47
5   2 c1       4 Id21         5
6   2 c1       0 Id21         6

#data
df<-
structure(list(Id1 = c(1L, 1L, 1L, 2L, 2L, 2L), Id21 = c(20L, 
40L, 43L, 47L, 5L, 6L), c1 = c(5L, 4L, 3L, 5L, 4L, 0L), Id22 = c(11L, 
14L, 15L, 17L, 12L, 10L), c2 = c(9L, 9L, 3L, 8L, 8L, 2L), Id23 = c(9L, 
13L, 23L, 11L, 14L, 24L), c3 = c(20L, 5L, 1L, 9L, 8L, 4L), Id24 = c(32L, 
36L, 39L, 10L, 28L, 23L), c4 = c(10L, 9L, 8L, 5L, 4L, 2L)), .Names = c("Id1", 
"Id21", "c1", "Id22", "c2", "Id23", "c3", "Id24", "c4"), class = "data.frame", row.names = c(NA, 
-6L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM