简体   繁体   English

查找列总和低于 R 中给定值的行

[英]Finding rows with sum of a column which is lower than a given value in R

I have a data frame (or data.table).我有一个数据框(或 data.table)。 I want to sort the rows in ascending order of a column and then select therows whose column value totals are just lower than a given value.我想按列的升序对行进行排序,然后选择列值总数刚好低于给定值的行。

For example let's say I have the mtcars data frame.例如,假设我有 mtcars 数据框。 I've sorted the rows in increasing order of qsec column.我已经按照 qsec 列的递增顺序对行进行了排序。 Now I want to find rows whose sum of qsec values are lower than say 100. And if I add the next row the sum will exceed 100.现在我想找到 qsec 值的总和低于 100 的行。如果我添加下一行,总和将超过 100。

I wrote a while loop for this but I am looking for a better vectoral solution.我为此编写了一个 while 循环,但我正在寻找更好的矢量解决方案。

> head((mtcars[order(mtcars$qsec), ]))
                mpg cyl disp  hp drat   wt  qsec vs am gear carb
Ford Pantera L 15.8   8  351 264 4.22 3.17 14.50  0  1    5    4
Maserati Bora  15.0   8  301 335 3.54 3.57 14.60  0  1    5    8
Camaro Z28     13.3   8  350 245 3.73 3.84 15.41  0  0    3    4
Ferrari Dino   19.7   6  145 175 3.62 2.77 15.50  0  1    5    6
Duster 360     14.3   8  360 245 3.21 3.57 15.84  0  0    3    4
Mazda RX4      21.0   6  160 110 3.90 2.62 16.46  0  1    4    4

In data.table use order to arrange columns and the cumsum function to find the rows whose cumulative sum is less than your cutoff在 data.table 中使用order排列列和cumsum函数来查找累积总和小于您的截止值的行

library(data.table)
mtcars <- copy(mtcars)                            # because binding is locked
setDT(mtcars)                                     # convert to data.table   
setorder(mtcars, qsec)                            # reorder rows
out <- mtcars[cumsum(qsec) < 100]                 # filter rows
out

In the tidyverse use arrange to sort columns and filter to select rows by criteria在 tidyverse 中使用arrange对列进行排序并filter以按条件选择行

library(tidyverse)
mtcars %>% arrange(qsec) %>% filter(cumsum(qsec) < 100)

Here are data.table and dplyr solutions which preserve row names, ie, the names of the cars, in line with OP's expected result.这里是data.tabledplyr解决方案,它们保留行名称,即汽车的名称,符合 OP 的预期结果。

Note that data.table as well as tidyverse drop the row names atttribute from data.frames by default.请注意,默认情况下, data.tabletidyverse会从 data.frames 中删除行名称属性。 To keep the row names as part of a data.table or tibble , resp., this has to be requested explicitely.为了保持该行的名称作为的一部分data.tabletibble ,RESP,这必须明确地提出要求。

data.table

library(data.table)
as.data.table(mtcars, key = "qsec", keep.rownames = TRUE)[cumsum(qsec) < 100]
 rn mpg cyl disp hp drat wt qsec vs am gear carb 1: Ford Pantera L 15.8 8 351 264 4.22 3.17 14.50 0 1 5 4 2: Maserati Bora 15.0 8 301 335 3.54 3.57 14.60 0 1 5 8 3: Camaro Z28 13.3 8 350 245 3.73 3.84 15.41 0 0 3 4 4: Ferrari Dino 19.7 6 145 175 3.62 2.77 15.50 0 1 5 6 5: Duster 360 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4 6: Mazda RX4 21.0 6 160 110 3.90 2.62 16.46 0 1 4 4

Here, as.data.table() replaces copy() , setDT() , and setorder() in one go.在这里, as.data.table() setDT()替换了copy()setDT()setorder() Setting the key on qsec orders the rows in ascending order of qsec as requested by the OP.qsec上设置键会按照 OP 的要求按qsec升序对行进行排序。 In addition, data.table chaining is used.此外,还使用了data.table链接

dplyr

library(dplyr)
mtcars %>% 
  as_tibble(rownames = "rn") %>% 
  arrange(qsec) %>% 
  filter(cumsum(qsec) < 100)
 # A tibble: 6 x 12 rn mpg cyl disp hp drat wt qsec vs am gear carb <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 Ford Pantera L 15.8 8 351 264 4.22 3.17 14.5 0 1 5 4 2 Maserati Bora 15 8 301 335 3.54 3.57 14.6 0 1 5 8 3 Camaro Z28 13.3 8 350 245 3.73 3.84 15.4 0 0 3 4 4 Ferrari Dino 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6 5 Duster 360 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 6 Mazda RX4 21 6 160 110 3.9 2.62 16.5 0 1 4 4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM