简体   繁体   中英

Converting numeric variable into character variable in group by id in data.table

I have following two datasets and, I am trying to find out the first observation of each group. In the following example, you can see that grouping by "id" in the first dataset ("df1") worked as expected (case1). It also worked when I grouped by "id2" in the second dataset (df2) (case2a). However, it didn't work (as expected) when I group by "id1" in the second dataset (case2b). Surprisingly, I got the expected output when I converted "id1" into character vector.

#case1
df1<- structure(list(id = c(1, 1, 1, 2, 2, 2, 3, 3, 3), stopId = structure(c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("a", "b", "c"), class = "factor"), 
    stopSequence = c(1, 2, 3, 3, 1, 4, 3, 1, 2)), .Names = c("id", 
"stopId", "stopSequence"), row.names = c(NA, -9L), class = "data.frame")

# first observation of each id: 

setDT(df)[,.SD[1,],by=.(id)] #worked

#df2
df2<-structure(list(id1 = c(201601072952201, 201601072952201, 201601072952201, 
201601072952213, 201601072952213, 201601072952213, 201601072952212, 
201601072952212, 201601072952212, 201601072952176), id2 = c("TXT", 
"TXT", "TXT", "TXT", "TXT", "TXT", "PLP", "PLP", "PLP", "KYK"
), sb = c(32L, 32L, 32L, 32L, 32L, 32L, 58L, 58L, 58L, 6L), bb = c(7L, 
7L, 7L, 56L, 56L, 56L, 28L, 28L, 28L, 47L), qt = c(21, 21, 21, 
420, 420, 420, 1000, 1000, 1000, 13), amt = c(301, 301, 301, 
306, 306, 306, 515, 515, 515, 368), rate = c(6321, 6321, 6321, 
128520, 128520, 128520, 515000, 515000, 515000, 4784)), .Names = c("id1", 
"id2", "sb", "bb", "qt", "amt", "rate"), class = "data.frame", row.names = c(NA, 
-10L))
#case2a
setDT(df2)[,.SD[1,],by=.(id2)] #worked
   id2             id1 sb bb   qt amt   rate
1: TXT 201601072952201 32  7   21 301   6321
2: PLP 201601072952212 58 28 1000 515 515000
3: KYK 201601072952176  6 47   13 368   4784

#case2b
 setDT(df2)[,.SD[1,],by=.(id1)] #not worked as expected
               id1 id2 sb bb qt amt rate
1: 201601072952201 TXT 32  7 21 301 6321

df2$id1<-as.character(df2$id1)
 setDT(df2)[,.SD[1,],by=.(id1)] # worked

So my question is why I need to convert numeric variable into character variable in case 2b but not in case1.

Try using standard functions in base . For example:

df2[!duplicated(df2$id1),]

Output:

           id1 id2 sb bb   qt amt   rate
1: 2.016011e+14 TXT 32  7   21 301   6321
2: 2.016011e+14 TXT 32 56  420 306 128520
3: 2.016011e+14 PLP 58 28 1000 515 515000
4: 2.016011e+14 KYK  6 47   13 368   4784

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM