简体   繁体   中英

Filtering rows in data.table while adding a column

The following returns a data.table with 150 rows

library(data.table)
irisDT <- iris %>% data.table
irisDT[Sepal.Width > 3, Petal.Width_rank := row_number(Petal.Width),
     by = "Species"]

However, I'm trying to do the subsetting Sepal.Width > 3 at the same time, instead of doing a "conditional mutate", ie I'm trying to do something like

library(dplyr)
iris %>%
  filter(Sepal.Width > 3) %>%
  group_by(Species) %>%
  mutate(Petal.Width_rank = row_number(Petal.Width))

What's the idiomatic way to do this in data.table?

Chain your calls:

data.table(iris)[
  Sepal.Width > 3
][,
  Petal.Width_rank := rank(Petal.Width, ties="first"), 
  by=Species
][]

This produces 67 rows.

You could try

DT1 <- setDT(iris)[Sepal.Width >3, c(.SD,list(Petal.Width_rank=
                    row_number(Petal.Width))), by=Species] 

 dim(DT1)
 #[1] 67  6

In data.table_1.9.5 , you can also use frank with different options for ties (as mentioned by @docendo discimus in the comments)

 DT2 <- setDT(iris)[Sepal.Width >3, c(.SD, list(Petal.Width_rank=
            frank(Petal.Width, ties.method='first'))), Species]

 dim(DT2)
 #[1] 67  6
 identical(DT1, DT2)
 #[1] TRUE

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM