簡體   English   中英

R中條件下的組變量和總和

[英]Group variable and sum under condition in R

我想在 ispurchase==1 為真的情況下對“dollvalue”的值求和,但是我找不到有效的解決方案。 我嘗試了其他帖子中的解決方案,這些解決方案似乎都太復雜了,最終無法正常工作。 我嘗試通過組合 group 和 aggregate 來組合 plyr 方法,但我得到錯誤參數 FUN is missing。

library(plyr)
returntrip <- roundtrips %>%
  group_by(id) %>%
  aggregate(purchcost = sum(dollvalue[ispurchase==1], 
                            FUN = sum)) %>%
  ungroup

我也嘗試簡單地聚合它,我認為它幾乎可以工作,但我收到以下錯誤:聚合.data.frame(as.data.frame(x),...)中的錯誤:參數必須具有相同的長度

我假設是因為列表和數據框的長度不同。 有沒有什么辦法解決這一問題?

returntrip <- aggregate(x = roundtrips$dollvalue[roundtrips$ispurchase==1],
      by = list(roundtrips$id),
      FUN = sum)

這是數據框片段的樣子:

頭部看起來像這樣:

                   ethamount                               dollvalue     id ispurchase             dollarcum
 1:  0.0000877963125548729991613761125535   -0.0010491659350307322180057001403952    883          1  0.000000000000000000
 2:  0.0010000000000000000208166817117217   -0.0107400000000000012817524819297432  36927          1  0.000000000000000000
 3: 75.4154000000000053205440053716301918 -804.6823180000000093059497885406017303   2637          1  0.000000000000000000
 4:  0.1066286798619889564232465772875003   -1.0662867986198896197436170041328296  72274          1  0.000000000000000000
 5:  0.0100000000000000002081668171172169   -0.1000000000000000055511151231257827  94359          1  0.010899999999999993
 6:  0.1000000000000000055511151231257827   -0.9460000000000001740829702612245455   3083          1  0.000000000000000000
 7:  1.0000000000000000000000000000000000   -9.3499999999999996447286321199499071 102645          1  0.000000000000000000
 8:  0.0000000000000000010000000000000001   -0.0000000000000000098900000000000005 117464          1  0.000000000000000000
 9:  0.0100000000000000002081668171172169   -0.1108999999999999985789145284797996  91239          1 -0.010899999999999993
10: 12.0000000000000000000000000000000000 -144.9600000000000079580786405131220818  52894          1  0.000000000000000000
11: 14.7899999999999991473487170878797770 -207.0600000000000022737367544323205948  80993          1  0.000000000000000000
12: 55.2299999999999968736119626555591822 -689.2703999999999950887286104261875153  74580          1  0.000000000000000000
13:  0.1000000000000000055511151231257827   -1.2480000000000002202682480856310576 116147          1  0.000000000000000000
14:  1.9995590000000000863167315401369706  -37.4517400699999996049882611259818077  36943          1  0.000000000000000000
15:  0.3914821535012809605724726225162158   -5.5786206873932533412130396754946560  86862          1  0.000000000000000000
16:  0.4893235858000000160217268785345368   -6.3122742568200003177025791956111789  88279          1  0.000000000000000000
17:  0.0001392130443151549901940194908789   -0.0016510667055777380248654528926977  72433          1  0.000000000000000000
18:  0.1000000000000000055511151231257827   -1.0160000000000000142108547152020037  68487          1  0.000000000000000000
19:  0.7211898100000000422227230956195854   -8.3946493884000012997148587601259351  28354          1  0.000000000000000000
20:  0.6650000000000000355271367880050093   -8.0265500000000002955857780762016773  80397          1  0.000000000000000000

非常感謝任何類型的提示或解決方案。

嘗試以下代碼,在其中使用條件對數據進行子集化:

library(dplyr)
df %>%
  group_by(id) %>%
  summarise(
    purchcost = sum(dollvalue[ispurchase == 1]), .groups = "drop")

輸出:

# A tibble: 20 × 2
       id purchcost
    <int>     <dbl>
 1    883 -1.05e- 3
 2   2637 -8.05e+ 2
 3   3083 -9.46e- 1
 4  28354 -8.39e+ 0
 5  36927 -1.07e- 2
 6  36943 -3.75e+ 1
 7  52894 -1.45e+ 2
 8  68487 -1.02e+ 0
 9  72274 -1.07e+ 0
10  72433 -1.65e- 3
11  74580 -6.89e+ 2
12  80397 -8.03e+ 0
13  80993 -2.07e+ 2
14  86862 -5.58e+ 0
15  88279 -6.31e+ 0
16  91239 -1.11e- 1
17  94359 -1   e- 1
18 102645 -9.35e+ 0
19 116147 -1.25e+ 0
20 117464 -9.89e-18

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM