I have data of the form :
Id1 Id21 c1 Id22 c2 Id23 c3 Id24 c4
1 20 5 11 9 9 20 32 10
1 40 4 14 9 13 5 36 9
1 43 3 15 3 23 1 39 8
2 47 5 17 8 11 9 10 5
2 5 4 12 8 14 8 28 4
2 6 0 10 2 24 4 23 2
3 . . . . . . . .
3 . . . . . . . .
3
4
.
.
100
100
100
Id1 with three entries for each Id has corresponding Id2i and ci, i -> [1,4] such that id2i is always in increasing order and ci is always in decreasing order for each id1. I need the output to be :
Id1 Id2 c
1 9 20
1 32 10
1 11 9
1 14 9
1 36 9
2 11 9
2 17 8
2 12 8
2 14 8
2 47 5
.
.
.
100
100
100
100
100 . .
so that for five entries of each id in id1, top 5 c's are chosen from all the ci, such that c (output) is max group of all the ci. How this can be achieved in R ?
Using dev version of data.table
:
# using first six rows from your post
require(data.table) # v1.9.5+
ans <- melt(setDT(df), measure = patterns(c("^Id2", "^c[0-9]$"))
value.name = c("Id2", "c"))
ans[order(-c), head(.SD, 5L), by=Id1, .SDcols = -(variable)]
# Id1 Id2 c
# 1: 1 9 20
# 2: 1 32 10
# 3: 1 11 9
# 4: 1 14 9
# 5: 1 36 9
# 6: 2 11 9
# 7: 2 17 8
# 8: 2 12 8
# 9: 2 14 8
# 10: 2 47 5
Basically, melt
can accept a list of column names to group columns from each element of the list into separate columns. Have a look at the result of lapply(...)
to see which columns are combined together.
Then we group by Id1
after ordering by c
column in decreasing order and pick the first 5 rows from subset of data belonging to each group.
You can use gather
from tidyr
and starts_with
from dplyr
to do this.
require(tidyr)
require(dplyr)
df %>%
gather(key = "Id2_key", value = "Id2", starts_with("Id2")) %>%
gather(key = "c_key", value = "c", starts_with("c"))
## Id1 Id2_key Id2 c_key c
## 1 1 Id21 20 c1 5
## 2 1 Id21 40 c1 4
## 3 1 Id21 43 c1 3
## 4 2 Id21 47 c1 5
## 5 2 Id21 5 c1 4
## 6 2 Id21 6 c1 0
## ... ...
#Try this: (df is your original dataframe)
library(reshape2)
df1<melt(df,measure.vars=paste0("c",1:4),variable.name="c",value.name="c_value")
df2<-melt(df1,measure.vars=paste0("Id2",1:4),variable.name="Id2",value.name="Id2_value")
head(df2)
Id1 c c_value Id2 Id2_value
1 1 c1 5 Id21 20
2 1 c1 4 Id21 40
3 1 c1 3 Id21 43
4 2 c1 5 Id21 47
5 2 c1 4 Id21 5
6 2 c1 0 Id21 6
#data
df<-
structure(list(Id1 = c(1L, 1L, 1L, 2L, 2L, 2L), Id21 = c(20L,
40L, 43L, 47L, 5L, 6L), c1 = c(5L, 4L, 3L, 5L, 4L, 0L), Id22 = c(11L,
14L, 15L, 17L, 12L, 10L), c2 = c(9L, 9L, 3L, 8L, 8L, 2L), Id23 = c(9L,
13L, 23L, 11L, 14L, 24L), c3 = c(20L, 5L, 1L, 9L, 8L, 4L), Id24 = c(32L,
36L, 39L, 10L, 28L, 23L), c4 = c(10L, 9L, 8L, 5L, 4L, 2L)), .Names = c("Id1",
"Id21", "c1", "Id22", "c2", "Id23", "c3", "Id24", "c4"), class = "data.frame", row.names = c(NA,
-6L))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.