I have following two datasets and, I am trying to find out the first observation of each group. In the following example, you can see that grouping by "id" in the first dataset ("df1") worked as expected (case1). It also worked when I grouped by "id2" in the second dataset (df2) (case2a). However, it didn't work (as expected) when I group by "id1" in the second dataset (case2b). Surprisingly, I got the expected output when I converted "id1" into character vector.
#case1
df1<- structure(list(id = c(1, 1, 1, 2, 2, 2, 3, 3, 3), stopId = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("a", "b", "c"), class = "factor"),
stopSequence = c(1, 2, 3, 3, 1, 4, 3, 1, 2)), .Names = c("id",
"stopId", "stopSequence"), row.names = c(NA, -9L), class = "data.frame")
# first observation of each id:
setDT(df)[,.SD[1,],by=.(id)] #worked
#df2
df2<-structure(list(id1 = c(201601072952201, 201601072952201, 201601072952201,
201601072952213, 201601072952213, 201601072952213, 201601072952212,
201601072952212, 201601072952212, 201601072952176), id2 = c("TXT",
"TXT", "TXT", "TXT", "TXT", "TXT", "PLP", "PLP", "PLP", "KYK"
), sb = c(32L, 32L, 32L, 32L, 32L, 32L, 58L, 58L, 58L, 6L), bb = c(7L,
7L, 7L, 56L, 56L, 56L, 28L, 28L, 28L, 47L), qt = c(21, 21, 21,
420, 420, 420, 1000, 1000, 1000, 13), amt = c(301, 301, 301,
306, 306, 306, 515, 515, 515, 368), rate = c(6321, 6321, 6321,
128520, 128520, 128520, 515000, 515000, 515000, 4784)), .Names = c("id1",
"id2", "sb", "bb", "qt", "amt", "rate"), class = "data.frame", row.names = c(NA,
-10L))
#case2a
setDT(df2)[,.SD[1,],by=.(id2)] #worked
id2 id1 sb bb qt amt rate
1: TXT 201601072952201 32 7 21 301 6321
2: PLP 201601072952212 58 28 1000 515 515000
3: KYK 201601072952176 6 47 13 368 4784
#case2b
setDT(df2)[,.SD[1,],by=.(id1)] #not worked as expected
id1 id2 sb bb qt amt rate
1: 201601072952201 TXT 32 7 21 301 6321
df2$id1<-as.character(df2$id1)
setDT(df2)[,.SD[1,],by=.(id1)] # worked
So my question is why I need to convert numeric variable into character variable in case 2b but not in case1.
Try using standard functions in base
. For example:
df2[!duplicated(df2$id1),]
Output:
id1 id2 sb bb qt amt rate
1: 2.016011e+14 TXT 32 7 21 301 6321
2: 2.016011e+14 TXT 32 56 420 306 128520
3: 2.016011e+14 PLP 58 28 1000 515 515000
4: 2.016011e+14 KYK 6 47 13 368 4784
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.