简体   繁体   English

计算 R 中变量级别的组合

[英]Count combinations of variable levels in R

I wasn't exactly sure how to title this question but here is what I am trying to do.我不确定如何命名这个问题,但这是我想要做的。 I have a dataframe with a "trip" column, and another with a "species caught" column.我有一个带有“旅行”列的数据框,另一个带有“捕获物种”列的数据框。 I am trying to count the number of trips where each species was captured with my species of interest.我试图计算每个物种被我感兴趣的物种捕获的旅行次数。 For example, say 5 trips caught both my species of interest and species x.例如,假设 5 次旅行同时捕获了我感兴趣的物种和物种 x。 I have created a simplified example here:我在这里创建了一个简化的示例:

trip = c(1,1,1,2,2,3,3,3,3,4,5)
color = c("red","orange","green","red","orange","orange","green","blue","purple","red","green")
dat = as.data.frame(cbind(trip,color))
dat

> dat
   trip  color
1     1    red
2     1 orange
3     1  green
4     2    red
5     2 orange
6     3 orange
7     3  green
8     3   blue
9     3 purple
10    4    red
11    5  green

say this is my dataframe, and I want to count the number of trips that contain the color red plus every other color.说这是我的数据帧,我想计算包含红色和其他所有颜色的旅行次数。 So I would end up with a dataframe that looks like this:所以我最终会得到一个如下所示的数据框:

color2 = c("orange","green","blue","purple")
trips.with.red = c(2,1,0,0)
dat2 = as.data.frame(cbind(color2,trips.with.red))
dat2

> dat2
  color2 trips.with.red
1 orange              2
2  green              1
3   blue              0
4 purple              0

Where for each of the other colors in the dataset, I get a column that shows the number of trips that contained that particular color and red.对于数据集中的其他每种颜色,我得到一列,显示包含该特定颜色和红色的行程数。 Any advice on how to do this would be appreciated.任何关于如何做到这一点的建议将不胜感激。

With dplyr , you can add an indicator if any row within a trip group includes color of red.随着dplyr ,您可以添加,如果一个指标any一个中排trip组包括color红色的。 Then, grouping by color you can summarise the total of these trips.然后,按color分组,您可以summarise这些行程的总数。

library(dplyr)

dat %>%
  group_by(trip) %>%
  mutate(trip_with_red = any(color == "red")) %>%
  filter(color != "red") %>%
  group_by(color) %>%
  summarise(trips_with_red = sum(trip_with_red))

Output输出

  color  trips_with_red
  <chr>           <int>
1 blue                0
2 green               1
3 orange              2
4 purple              0

Does this work:这是否有效:

> dat %>% group_by(trip) %>% mutate(flag = map_dbl(color, ~ if(.x == 'red') 1 else 0)) %>% 
+                                   mutate(flag = max(flag)) %>% 
+                                               filter(color != 'red') %>% ungroup() %>% group_by(color) %>% 
+                                                       summarise(trips_with_red = sum(flag)) %>% arrange(desc(trips_with_red))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 4 x 2
  color  trips_with_red
  <chr>           <dbl>
1 orange              2
2 green               1
3 blue                0
4 purple              0
> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM