简体   繁体   English

用于过滤和汇总数据帧的 R for 循环(使用 dyplr)?

[英]R for loop for filtering and summarizing dataframe (with dyplr)?

I am using a simple command with dyplr to first filter a dataframe by two columns and then report the sum of another column.我使用带有 dyplr 的简单命令首先按两列过滤数据帧,然后报告另一列的总和。 However I would like to create a loop so that the filtering criteria can be automated by a list of values.但是,我想创建一个循环,以便可以通过值列表自动执行过滤条件。 For example the code for a single instance:例如单个实例的代码:

library(dplyr)
df = data.frame(Category1 = sample(c("FilterMe","DoNotFilterMe"), 15, replace=TRUE), 
          Category2 = sample(c("1","3","5","10"),15, replace=TRUE),
          Value = 1:15)

df %>%
filter(Category1=="FilterMe" & Category2="1") %>%
summarize(result=sum(Value))

This works perfectly and I get a single value of 15. However I would like to loop the command such that I can do multiple values for Category2 defined by a list of integers (not sequential).这工作得很好,我得到了 15 的单个值。但是我想循环命令,以便我可以为由整数列表(非顺序)定义的 Category2 执行多个值。 I want it to loop for each value of i and provide a different output value each time.我希望它为 i 的每个值循环并每次提供不同的输出值。 I tried the code below but was left with a null value.我尝试了下面的代码,但留下了一个空值。

library(dplyr)
for (i in c(1,3,5,10){
df %>%
filter(Category1=="FilterMe" & Category2="i") %>%
summarize(result=sum(Value))}

If there is another way besides loop that would fulfill the same objective that is fine by me.如果除了循环之外还有另一种方式可以实现对我来说很好的相同目标。

If I understood what you want to do, you are looking for group_by.如果我明白你想做什么,你正在寻找 group_by。

library(dplyr)
df %>%
   filter(Category1 =="FilterMe") %>%
   group_by(Category2) %>%
   summarize(result=sum(Value))

We don't need a loop.我们不需要循环。 It can be simplified with %in% instead of == and then do group_by sum approach可以用%in%而不是==来简化,然后执行group_by sum方法

library(dplyr)
df %>%
  filter(Category1=="FilterMe" & Category2 %in% c(1, 3, 5, 10)) %>%
  group_by(Category2) %>%
  summarize(result=sum(Value))

-output -输出

# A tibble: 4 × 2
  Category2 result
  <chr>      <int>
1 1              4
2 10            15
3 3             17
4 5             19

With a for loop, we need to store the output in each of the iteration ie a list对于for循环,我们需要在每次迭代中存储输出,即一个list

v1  <- c(1, 3, 5, 10)
lst1 <- vector('list', length(v1))
for (i in seq_along(v1)){
  lst1[[i]] <- df %>%
      filter(Category1=="FilterMe" & Category2 ==v1[i]) %>%
      summarize(result=sum(Value))

}

-output -输出

> lst1
[[1]]
  result
1      4

[[2]]
  result
1     17

[[3]]
  result
1     19

[[4]]
  result
1     15

Or may directly store the output in a list with map / lapply或者可以直接将输出存储在带有map / lapplylist

library(purrr)
map(c(1, 3, 5, 10), ~ 
       df %>%
         filter(Category1 == "FilterMe", Category2 == .x) %>%
         summarise(result = sum(Value)))

-output -输出

[[1]]
  result
1      4

[[2]]
  result
1     17

[[3]]
  result
1     19

[[4]]
  result
1     15

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM