[英]Count combinations of variable levels in R
I wasn't exactly sure how to title this question but here is what I am trying to do.我不确定如何命名这个问题,但这是我想要做的。 I have a dataframe with a "trip" column, and another with a "species caught" column.
我有一个带有“旅行”列的数据框,另一个带有“捕获物种”列的数据框。 I am trying to count the number of trips where each species was captured with my species of interest.
我试图计算每个物种被我感兴趣的物种捕获的旅行次数。 For example, say 5 trips caught both my species of interest and species x.
例如,假设 5 次旅行同时捕获了我感兴趣的物种和物种 x。 I have created a simplified example here:
我在这里创建了一个简化的示例:
trip = c(1,1,1,2,2,3,3,3,3,4,5)
color = c("red","orange","green","red","orange","orange","green","blue","purple","red","green")
dat = as.data.frame(cbind(trip,color))
dat
> dat
trip color
1 1 red
2 1 orange
3 1 green
4 2 red
5 2 orange
6 3 orange
7 3 green
8 3 blue
9 3 purple
10 4 red
11 5 green
say this is my dataframe, and I want to count the number of trips that contain the color red plus every other color.说这是我的数据帧,我想计算包含红色和其他所有颜色的旅行次数。 So I would end up with a dataframe that looks like this:
所以我最终会得到一个如下所示的数据框:
color2 = c("orange","green","blue","purple")
trips.with.red = c(2,1,0,0)
dat2 = as.data.frame(cbind(color2,trips.with.red))
dat2
> dat2
color2 trips.with.red
1 orange 2
2 green 1
3 blue 0
4 purple 0
Where for each of the other colors in the dataset, I get a column that shows the number of trips that contained that particular color and red.对于数据集中的其他每种颜色,我得到一列,显示包含该特定颜色和红色的行程数。 Any advice on how to do this would be appreciated.
任何关于如何做到这一点的建议将不胜感激。
With dplyr
, you can add an indicator if any
row within a trip
group includes color
of red.随着
dplyr
,您可以添加,如果一个指标any
一个中排trip
组包括color
红色的。 Then, grouping by color
you can summarise
the total of these trips.然后,按
color
分组,您可以summarise
这些行程的总数。
library(dplyr)
dat %>%
group_by(trip) %>%
mutate(trip_with_red = any(color == "red")) %>%
filter(color != "red") %>%
group_by(color) %>%
summarise(trips_with_red = sum(trip_with_red))
Output输出
color trips_with_red
<chr> <int>
1 blue 0
2 green 1
3 orange 2
4 purple 0
Does this work:这是否有效:
> dat %>% group_by(trip) %>% mutate(flag = map_dbl(color, ~ if(.x == 'red') 1 else 0)) %>%
+ mutate(flag = max(flag)) %>%
+ filter(color != 'red') %>% ungroup() %>% group_by(color) %>%
+ summarise(trips_with_red = sum(flag)) %>% arrange(desc(trips_with_red))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 4 x 2
color trips_with_red
<chr> <dbl>
1 orange 2
2 green 1
3 blue 0
4 purple 0
>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.