繁体   English   中英

如何使用tidyverse在条件下计算选定列的行总和或计数?

[英]How to calculate row sums or counts on selected columns with condition using tidyverse?

我有以下数据框(这是大于3000 obs的较大数据框的子集,具有2个不同级别的年份):

rp.pptn <- data.frame(id = c("150015", "150016", "150017", "150018", 
"150019", "150020"), year = structure(c(1L, 1L, 1L, 1L, 1L, 1L),
.Label = c("15", "18"), class = "factor"), 
freqtools = c(1, 1, 2, 1, 1, 3), freqtrees = c(2, 3, 3, 5, 4, 3), 
freqrt = c(2, 2, 2, 2, 1, 3), freqroamfriends = c(1, 1, 1, 3, 1, 1), 
freqroamalone = c(1, 1, 1, 2, 1, 1), freqparts = c(2, 2, 2, 2, 3, 3), 
freqmessy = c(5, 5, 2, 5, 4, 5), freqride = c(3, 1, 2, 5, 3, 3), 
freqrain = c(1, 3, 2, 3, 1, 3))

我想count满足条件的cols c(3:11)中的值。 我一直在尝试rowSums,因为当我没有id或分组变量yearrowSums实际上给了我这样的计数:

rp.pptn.no.id <- rp.pptn %>%
   select(c(3:11)) %>%
   mutate(pptnlow = rowSums(pptnrp == 1 | pptnrp == 2 | pptnrp == 6))

我还能够如下计算选择列的rowSums

rp.pptn <- rp.pptn %>% 
   mutate(pptnlow = rowSums(.[c(3:11)]))

但是,鉴于我需要idyear来进行后续分析,因此我想一次性完成这两个步骤。 我很感兴趣为什么要考虑到我的数据是数字的, rowSums在一开始的rowSums会给我计数而不是总和。 我实际上希望计数,即有多少列符合我的条件?

搜索使我认为基于此的某些功能可能会起作用:

rp.pptn <- rp.pptn %>% 
  mutate(pptnlow = rowSums(. [3:11]) %in% c(1, 2, 6))

这返回逻辑向量= FALSE ,大概是因为我的条件未满足。 我认为我并没有丢失太多,但最终我想要的是以下df:

rp.pptn <- data.frame(id = c("150015", "150016", "150017", "150018", 
"150019", "150020"), year = structure(c(1L, 1L, 1L, 1L, 1L, 1L), 
.Label = c("15", "18"), class = "factor"), 
freqtools = c(1, 1, 2, 1, 1, 3), freqtrees = c(2, 3, 3, 5, 4, 3), 
freqrt = c(2, 2, 2, 2, 1, 3), freqroamfriends = c(1, 1, 1, 3, 1, 1), 
freqroamalone = c(1, 1, 1, 2, 1, 1), freqparts = c(2, 2, 2, 2, 3, 3), 
freqmessy = c(5, 5, 2, 5, 4, 5), freqride = c(3, 1, 2, 5, 3, 3), 
freqrain = c(1, 3, 2, 3, 1, 3), pptnlow = c(7, 6, 8, 4, 5, 2))

如前所述,我的实际数据集更大,因此自动化程度越高越好! 谢谢。

一种选择是reducemap

library(tidyverse)
map(c(1, 2, 6), ~ rp.pptn %>% 
                   transmute_at(3:11, funs(. == .x)) %>% 
                   reduce(`+`)) %>% 
                   reduce(`+`) %>%
     mutate(rp.pptn, pptnlow = .)

或与rowSumsmap

map(c(1, 2, 6), ~ 
        rp.pptn %>% 
          select(3:11) %>% 
          transmute(pptnlow = rowSums(. == .x)))  %>% 
      bind_cols %>% 
      rowSums %>% 
      mutate(rp.pptn, pptnlow = .)

我们可以使用mutate_at使用TRUEFALSE替换基于条件( mutate_at的值,使用rowSums ,然后绑定到原始数​​据帧。

library(dplyr)

rp.pptn2 <- rp.pptn %>%
  mutate_at(vars(3:11), funs(. %in% c(1, 2, 6))) %>%
  transmute(pptnlow = rowSums(.[, 3:11])) %>%
  bind_cols(rp.pptn, .)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM