简体   繁体   中英

R: filtering a data.table with dplyr fails

I am on Windows using R 4.0.2 and data.table 1.13.0 and dplyr 1.0.0

This is such a weird bug that I can't make reproducible example.

library(data.table)
df2 = structure(list(total_amount = 9.39999961853027, tip_amount = 0, 
               total_amount = 9.39999961853027, passenger_count = 1L), row.names = c(NA, 
        -1L), class = c("data.table", "data.frame"))

# this works
df2[total_amount > 10, ] 

# this works
df2 %>% 
  data.frame %>%
  filter(total_amount > 10)

# this doesn't work!!!
df2 %>% 
  filter(total_amount > 10)

and gives error Error in.subset2(chunks, self$get_current_group()): attempt to select less than one element in integerOneIndex

This is so perplexing. What is going?

The issues seems to be that if two columns have the SAME name then it errors.

The reason for this is that your data.table is badly designed. You have two columns called total_amount. How in this case is dplyr supposed to know what to do when filtering? It looks at your filter condition, and then looks for total_amount in the table. It finds two columns with that name and then rightly throws an error as there is no way of knowing which column to use. dplyr is doing what it should be doing. Essentially your data is not tidy, and that is what dplyr expects.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM