簡體   English   中英

R:矢量與條件矢量之和

[英]R: sum vector by vector of conditions

我試圖獲得一個向量,其中包含符合條件的元素的總和。

    values = runif(5000)
    bin = seq(0, 0.9, by = 0.1)
    sum(values < bin)

我期望sum會返回10個值-每個“ bin”元素都適合“ <”條件的“ values”元素之和。 但是,它僅返回一個值。 如何在不使用while循環的情況下獲得結果?

我理解這意味着您想要對於bin每個值,小於bin values中的元素數量。 所以我想你想在這里vapply()

vapply(bin, function(x) sum(values < x), 1L)
# [1]    0  497 1025 1501 1981 2461 2955 3446 3981 4526

如果您想要一張小桌子作為參考,則可以添加名稱

v <- vapply(bin, function(x) sum(values < x), 1L)
setNames(v, bin)
#   0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9 
#   0  497 1025 1501 1981 2461 2955 3446 3981 4526 

我個人更喜歡data.table不是tapplyvapply ,而findInterval不是cut

set.seed(1)
library(data.table)
dt <- data.table(values, groups=findInterval(values, bin))
setkey(dt, groups)
dt[,.(n=.N, v=sum(values)), groups][,list(cumsum(n), cumsum(v)),]
#      V1         V2
# 1:  537   26.43445
# 2: 1041  101.55686
# 3: 1537  226.12625
# 4: 2059  410.41487
# 5: 2564  637.18782
# 6: 3050  904.65876
# 7: 3473 1180.53342
# 8: 3951 1540.18559
# 9: 4464 1976.23067
#10: 5000 2485.44920

cbind(vapply(bin, function(x) sum(values < x), 1L)[-1], 
cumsum(tapply(  values,  cut(values, bin), sum)))    
#          [,1]       [,2]
#(0,0.1]    537   26.43445
#(0.1,0.2] 1041  101.55686
#(0.2,0.3] 1537  226.12625
#(0.3,0.4] 2059  410.41487
#(0.4,0.5] 2564  637.18782
#(0.5,0.6] 3050  904.65876
#(0.6,0.7] 3473 1180.53342
#(0.7,0.8] 3951 1540.18559
#(0.8,0.9] 4464 1976.23067

在帶有cut() INDEX向量中使用tapply似乎可以實現:

 tapply(  values,  cut(values, bin), sum)
  (0,0.1] (0.1,0.2] (0.2,0.3] (0.3,0.4] (0.4,0.5] (0.5,0.6] 
 25.43052  71.06897 129.99698 167.56887 222.74620 277.16395 
(0.6,0.7] (0.7,0.8] (0.8,0.9] 
332.18292 368.49341 435.01104 

盡管我猜想您會希望剪切矢量擴展到1.0:

bin = seq(0, 1, by = 0.1)
tapply(  values,  cut(values, bin), sum)

  (0,0.1] (0.1,0.2] (0.2,0.3] (0.3,0.4] (0.4,0.5] (0.5,0.6] 
 25.48087  69.87902 129.37348 169.46013 224.81064 282.22455 
(0.6,0.7] (0.7,0.8] (0.8,0.9]   (0.9,1] 
335.43991 371.60885 425.66550 463.37312 

我發現我對問題的理解與理查德不同。 如果您想要他的結果,可以在我的結果上使用cumsum

使用dplyr

set.seed(1)
library(dplyr)
df %>% group_by(groups) %>% 
  summarise(count = n(), sum = sum(values)) %>% 
  mutate(cumcount= cumsum(count), cumsum = cumsum(sum))

輸出:

      groups count       sum cumcount     cumsum
1    (0,0.1]   537  26.43445      537   26.43445
2  (0.1,0.2]   504  75.12241     1041  101.55686
3  (0.2,0.3]   496 124.56939     1537  226.12625
4  (0.3,0.4]   522 184.28862     2059  410.41487
5  (0.4,0.5]   505 226.77295     2564  637.18782
6  (0.5,0.6]   486 267.47094     3050  904.65876
7  (0.6,0.7]   423 275.87466     3473 1180.53342
8  (0.7,0.8]   478 359.65217     3951 1540.18559
9  (0.8,0.9]   513 436.04508     4464 1976.23067
10        NA   536 509.21853     5000 2485.44920

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM