[英]R: how to find the ratio when using dplyr
I want to add a column to find ratio of the element which divide by the total of elements that shared same type, for example, (type,genre)=(1,0),the ratio will be n/sum(same type)=2/3我想添加一列来查找元素除以共享相同类型的元素总数的比率,例如,(类型,流派)=(1,0),比率将为 n/sum(相同类型) =2/3
coco<-data.frame(type=c(1,2,1,2,3,1,2,3,4,4),genre=c(0,1,0,1,1,1,0,0,1,0))
coco%>%group_by(type,genre)%>%summarise(n=n())
# A tibble: 8 x 3
# Groups: type [4]
type genre n
<dbl> <dbl> <int>
1 1 0 2
2 1 1 1
3 2 0 1
4 2 1 2
5 3 0 1
6 3 1 1
7 4 0 1
8 4 1 1
coco%>%count(type)
type n
1 1 3
2 2 3
3 3 2
4 4 2
I tried to use:我尝试使用:
coco%>%group_by(type,genre)%>%summarise(n=n(),ratio=n/sum(type))
but didn't work, it should print out like:但没有用,它应该像这样打印出来:
type genre n ratio
<dbl> <dbl> <int>
1 1 0 2 0.66
2 1 1 1 0.33
3 2 0 1 0.33
4 2 1 2 0.66
5 3 0 1 0.5
6 3 1 1 0.5
7 4 0 1 0.5
8 4 1 1 0.5
May I ask what part should I modify?请问我应该修改哪个部分? (Sorry for bad explanation and thank in advance)
(抱歉解释不好,提前感谢)
A shortcut for group_by(x) %>% summarize(n = n())
is count(x)
. group_by(x) %>% summarize(n = n())
的快捷方式是count(x)
。
Your code would work if you modified to如果您修改为,您的代码将起作用
coco%>%group_by(type,genre)%>%summarise(n=n()) %>% mutate(ratio=n/sum(n))
The summarise line leaves the type
grouping intact, at which point you can feed that into mutate
where you compare that n
to the total n
for that group of type
. summarise 行使
type
分组保持不变,此时您可以将其输入mutate
,在其中将n
与该组的总n
进行比较type
。
Here's another way, which I slightly prefer since the type
grouping is written explicitly.这是另一种方式,我更喜欢这种方式,因为
type
分组是显式编写的。 (I have made mistakes before by not realizing what level of grouping remained after a group_by - summarize
...) (我之前犯过错误,因为我没有意识到
group_by - summarize
......)
coco %>%
count(type, genre) %>%
group_by(type) %>%
mutate(ratio = n / sum(n)) %>%
ungroup()
# A tibble: 8 x 4
type genre n ratio
<dbl> <dbl> <int> <dbl>
1 1 0 2 0.667
2 1 1 1 0.333
3 2 0 1 0.333
4 2 1 2 0.667
5 3 0 1 0.5
6 3 1 1 0.5
7 4 0 1 0.5
8 4 1 1 0.5
I hope I got what you have in mind right:我希望我得到你的想法是正确的:
coco %>%
add_count(type) %>%
arrange(type) %>%
group_by(genre, type) %>%
mutate(avg = n() / n)
# A tibble: 10 x 4
# Groups: genre, type [8]
type genre n avg
<dbl> <dbl> <int> <dbl>
1 1 0 3 0.667
2 1 0 3 0.667
3 1 1 3 0.333
4 2 1 3 0.667
5 2 1 3 0.667
6 2 0 3 0.333
7 3 1 2 0.5
8 3 0 2 0.5
9 4 1 2 0.5
10 4 0 2 0.5
You need to divide n with the count sum(n) to get you desired results because the data was not grouped by type only您需要将 n 除以 count sum(n) 以获得所需的结果,因为数据不是仅按类型分组的
kind check my code请检查我的代码
coco %>% group_by(type,genre) %>%
summarise(n=n(), ) %>%
mutate(ratio = n/sum(n))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.