繁体   English   中英

有条件地对 R 中的列中的值求和

[英]Conditionally summing values in column in R

我有一个大型数据集,其中包括美国每个县的收入和支出明细。 有一列(类型)包含收入或支出类型的代码。 我需要为每个县的总收入和总支出求和总美元金额(流量)。 我一直在尝试使用以下代码并不断收到此错误: FUN(X[[I]], ...) 中的错误:仅在具有所有类似数字的变量的数据帧上定义。 有没有办法解决这个错误或做我想做的其他方式? 在此先感谢您的帮助!

数据链接在这里

county2013$Expenditures <- county2013 %>%
  group_by(FIPS) %>%
  sum(county2013[which(county2013$Type == 'B01' | county2013$Type == 'B21' | county2013$Type == 'B01' | county2013$Type == 'B21' | county2013$Type == 'B22' | county2013$Type == 'B30' | county2013$Type == 'B42' | county2013$Type == 'B46' | county2013$Type == 'B50' | county2013$Type == 'B59' | county2013$Type == 'B79' | county2013$Type == 'B80' | county2013$Type == 'B89' | county2013$Type == 'B91' | county2013$Type == 'B92' | county2013$Type == 'B93' | county2013$Type == 'B94' | county2013$Type == 'C21' | county2013$Type == 'C30' | county2013$Type == 'C42' | county2013$Type == 'C46' | county2013$Type == 'C50' | county2013$Type == 'C79' | county2013$Type == 'C80' | county2013$Type == 'C89' | county2013$Type == 'C91' | county2013$Type == 'C92' | county2013$Type == 'C93' | county2013$Type == 'C94' | county2013$Type == 'D21' | county2013$Type == 'D30' | county2013$Type == 'D42' | county2013$Type == 'D46' | county2013$Type == 'D50' | county2013$Type == 'D79' | county2013$Type == 'D80' | county2013$Type == 'D89' | county2013$Type == 'D91' | county2013$Type == 'D92' | county2013$Type == 'D93' | county2013$Type == 'D94' | county2013$Type == 'T01'| county2013$Type =='T09' | county2013$Type =='T10' | county2013$Type == 'T11' | county2013$Type == 'T12'  | county2013$Type == 'T13' | county2013$Type == 'T14' | county2013$Type =='T15' | county2013$Type == 'T16' | county2013$Type == 'T19'| county2013$Type == 'T20' | county2013$Type == 'T21' | county2013$Type == 'T22' | county2013$Type == 'T23' | county2013$Type == 'T24' | county2013$Type == 'T25' | county2013$Type == 'T27' | county2013$Type == 'T28' | county2013$Type == 'T29' | county2013$Type == 'T40' | county2013$Type == 'T41' | county2013$Type == 'T50' | county2013$Type == 'T51' | county2013$Type == 'T53' | county2013$Type == 'T99' | county2013$Type == 'A01' | county2013$Type == 'A03' | county2013$Type == 'A09' | county2013$Type == 'A10' | county2013$Type == 'A12' | county2013$Type == 'A16'| county2013$Type == 'A18'| county2013$Type == 'A21' | county2013$Type == 'A36' | county2013$Type == 'A44' | county2013$Type == 'A45' | county2013$Type == 'A50' | county2013$Type == 'A56' | county2013$Type == 'A59'| county2013$Type == 'A60'| county2013$Type == 'A61'| county2013$Type == 'A80'| county2013$Type == 'A81'| county2013$Type == 'A87'| county2013$Type == 'A89' | county2013$Type == 'U01' | county2013$Type == 'U11' | county2013$Type == 'U20' | county2013$Type == 'U21' | county2013$Type == 'U30' | county2013$Type == 'U40' | county2013$Type == 'U41' | county2013$Type == 'U50' | county2013$Type == 'U95' | county2013$Type == 'U99' | county2013$Type == 'A90' | county2013$Type == 'A91' | county2013$Type == 'A92' | county2013$Type == 'A93' | county2013$Type == 'A94' | county2013$Type == 'X01' | county2013$Type == 'X02' | county2013$Type == 'X05' | county2013$Type == 'X08' | county2013$Type == 'Y01' | county2013$Type == 'Y02' | county2013$Type == 'Y04' | county2013$Type == 'Y11' | county2013$Type == 'Y12' | county2013$Type == 'Y51' | county2013$Type == 'Y52'), 5])

最有可能发生该错误是因为您的结果框架包含字符和数字。 下一个解决方案适用于人造玩具示例中可用的有限信息:

data
  [1] "B01" "B21" "B22" "B30" "B42" "B46" "B50" "B59" "B79" "B80" "B89" "B91"
 [13] "B92" "B93" "B94" "C21" "C30" "C42" "C46" "C50" "C79" "C80" "C89" "C91"
 [25] "C92" "C93" "C94" "D21" "D30" "D42" "D46" "D50" "D79" "D80" "D89" "D91"
 [37] "D92" "D93" "D94" "T01" "T09" "T10" "T11" "T12" "T13" "T14" "T15" "T16"
 [49] "T19" "T20" "T21" "T22" "T23" "T24" "T25" "T27" "T28" "T29" "T40" "T41"
 [61] "T50" "T51" "T53" "T99" "A01" "A03" "A09" "A10" "A12" "A16" "A18" "A21"
 [73] "A36" "A44" "A45" "A50" "A56" "A59" "A60" "A61" "A80" "A81" "A87" "A89"
 [85] "U01" "U11" "U20" "U21" "U30" "U40" "U41" "U50" "U95" "U99" "A90" "A91"
 [97] "A92" "A93" "A94" "X01" "X02" "X05" "X08" "Y01" "Y02" "Y04" "Y11" "Y12"
[109] "Y51" "Y52"

sum(county2013$Flow[county2013$Type %in% data])

我希望我了解您要完成的工作。 有323个独特的“类型”,其中一些是支出,一些是收入。 您想按收入或支出进行分组。

我不知道您列出的替代方案是支出还是收入,但例如,假设它是支出。

与其写一个很长的“county2013$Type == 'B01'|county2013$Type == 'B21'等,不如将与支出对应的类型的值放入向量中更容易。为了节省你的打字时间,我使用 stringr 修改您的代码并完成此操作。

library(tidyverse)
text <- c("county2013$Type == 'B01' | county2013$Type == 'B21' | county2013$Type == 'B01' | county2013$Type == 'B21' | county2013$Type == 'B22' | county2013$Type == 'B30' | county2013$Type == 'B42' | county2013$Type == 'B46' | county2013$Type == 'B50' | county2013$Type == 'B59' | county2013$Type == 'B79' | county2013$Type == 'B80' | county2013$Type == 'B89' | county2013$Type == 'B91' | county2013$Type == 'B92' | county2013$Type == 'B93' | county2013$Type == 'B94' | county2013$Type == 'C21' | county2013$Type == 'C30' | county2013$Type == 'C42' | county2013$Type == 'C46' | county2013$Type == 'C50' | county2013$Type == 'C79' | county2013$Type == 'C80' | county2013$Type == 'C89' | county2013$Type == 'C91' | county2013$Type == 'C92' | county2013$Type == 'C93' | county2013$Type == 'C94' | county2013$Type == 'D21' | county2013$Type == 'D30' | county2013$Type == 'D42' | county2013$Type == 'D46' | county2013$Type == 'D50' | county2013$Type == 'D79' | county2013$Type == 'D80' | county2013$Type == 'D89' | county2013$Type == 'D91' | county2013$Type == 'D92' | county2013$Type == 'D93' | county2013$Type == 'D94' | county2013$Type == 'T01'| county2013$Type =='T09' | county2013$Type =='T10' | county2013$Type == 'T11' | county2013$Type == 'T12'  | county2013$Type == 'T13' | county2013$Type == 'T14' | county2013$Type =='T15' | county2013$Type == 'T16' | county2013$Type == 'T19'| county2013$Type == 'T20' | county2013$Type == 'T21' | county2013$Type == 'T22' | county2013$Type == 'T23' | county2013$Type == 'T24' | county2013$Type == 'T25' | county2013$Type == 'T27' | county2013$Type == 'T28' | county2013$Type == 'T29' | county2013$Type == 'T40' | county2013$Type == 'T41' | county2013$Type == 'T50' | county2013$Type == 'T51' | county2013$Type == 'T53' | county2013$Type == 'T99' | county2013$Type == 'A01' | county2013$Type == 'A03' | county2013$Type == 'A09' | county2013$Type == 'A10' | county2013$Type == 'A12' | county2013$Type == 'A16'| county2013$Type == 'A18'| county2013$Type == 'A21' | county2013$Type == 'A36' | county2013$Type == 'A44' | county2013$Type == 'A45' | county2013$Type == 'A50' | county2013$Type == 'A56' | county2013$Type == 'A59'| county2013$Type == 'A60'| county2013$Type == 'A61'| county2013$Type == 'A80'| county2013$Type == 'A81'| county2013$Type == 'A87'| county2013$Type == 'A89' | county2013$Type == 'U01' | county2013$Type == 'U11' | county2013$Type == 'U20' | county2013$Type == 'U21' | county2013$Type == 'U30' | county2013$Type == 'U40' | county2013$Type == 'U41' | county2013$Type == 'U50' | county2013$Type == 'U95' | county2013$Type == 'U99' | county2013$Type == 'A90' | county2013$Type == 'A91' | county2013$Type == 'A92' | county2013$Type == 'A93' | county2013$Type == 'A94' | county2013$Type == 'X01' | county2013$Type == 'X02' | county2013$Type == 'X05' | county2013$Type == 'X08' | county2013$Type == 'Y01' | county2013$Type == 'Y02' | county2013$Type == 'Y04' | county2013$Type == 'Y11' | county2013$Type == 'Y12' | county2013$Type == 'Y51' | county2013$Type == 'Y52'")


ex <- text %>% str_extract_all("== '(.{3})") %>% 
unlist() %>%
str_replace_all("== '", "") 

我们现在有一个名为ex的向量,其中包含表示支出的所有类型代码。

第二步是向 X2013 dataframe 添加一列,说明该行对应于支出还是收入。 为此,我们使用mutate ,然后是 ifelse%in%运算符,该运算符根据它们是否存在于“ex”向量中来过滤值。

第三步是使用county和新创建的列exp_rev做一个group_by ,然后总结流程。

希望这与您尝试实现的目标相对应:

X2013 %>% mutate(exp_rev = ifelse(Type %in% ex, "Expendidure", "Revenue")) %>% 
  group_by(County, exp_rev) %>%
  summarize(Flow_sum = sum(Flow))

# A tibble: 510 x 3
# Groups:   County [255]
   County exp_rev       Flow_sum
   <chr>  <chr>            <dbl>
 1 000    Expendidure 1940831046
 2 000    Revenue     9093188172
 3 001    Expendidure   49803211
 4 001    Revenue      239163156
 5 002    Expendidure   41984941
 6 002    Revenue      212603913
 7 003    Expendidure   22009389
 8 003    Revenue      104761407

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM