简体   繁体   English

R:使用 dplyr 时如何找到比率

[英]R: how to find the ratio when using dplyr

I want to add a column to find ratio of the element which divide by the total of elements that shared same type, for example, (type,genre)=(1,0),the ratio will be n/sum(same type)=2/3我想添加一列来查找元素除以共享相同类型的元素总数的比率,例如,(类型,流派)=(1,0),比率将为 n/sum(相同类型) =2/3

coco<-data.frame(type=c(1,2,1,2,3,1,2,3,4,4),genre=c(0,1,0,1,1,1,0,0,1,0))
  
coco%>%group_by(type,genre)%>%summarise(n=n())

 # A tibble: 8 x 3
# Groups:   type [4]
  type genre     n
    <dbl> <dbl> <int>
1     1     0     2
2     1     1     1
3     2     0     1
4     2     1     2
5     3     0     1
6     3     1     1
7     4     0     1
8     4     1     1

coco%>%count(type)
  type n
1    1 3
2    2 3
3    3 2
4    4 2

I tried to use:我尝试使用:

 coco%>%group_by(type,genre)%>%summarise(n=n(),ratio=n/sum(type))

but didn't work, it should print out like:但没有用,它应该像这样打印出来:

    type genre     n  ratio
    <dbl> <dbl> <int>
1     1     0     2    0.66
2     1     1     1    0.33
3     2     0     1    0.33
4     2     1     2    0.66
5     3     0     1    0.5
6     3     1     1    0.5
7     4     0     1    0.5
8     4     1     1    0.5

May I ask what part should I modify?请问我应该修改哪个部分? (Sorry for bad explanation and thank in advance) (抱歉解释不好,提前感谢)

A shortcut for group_by(x) %>% summarize(n = n()) is count(x) . group_by(x) %>% summarize(n = n())的快捷方式是count(x)

Your code would work if you modified to如果您修改为,您的代码将起作用

coco%>%group_by(type,genre)%>%summarise(n=n()) %>% mutate(ratio=n/sum(n))

The summarise line leaves the type grouping intact, at which point you can feed that into mutate where you compare that n to the total n for that group of type . summarise 行使type分组保持不变,此时您可以将其输入mutate ,在其中将n与该组的总n进行比较type

Here's another way, which I slightly prefer since the type grouping is written explicitly.这是另一种方式,我更喜欢这种方式,因为type分组是显式编写的。 (I have made mistakes before by not realizing what level of grouping remained after a group_by - summarize ...) (我之前犯过错误,因为我没有意识到group_by - summarize ......)

coco %>%
  count(type, genre) %>%
  group_by(type) %>%
  mutate(ratio = n / sum(n)) %>%
  ungroup()

# A tibble: 8 x 4
   type genre     n ratio
  <dbl> <dbl> <int> <dbl>
1     1     0     2 0.667
2     1     1     1 0.333
3     2     0     1 0.333
4     2     1     2 0.667
5     3     0     1 0.5  
6     3     1     1 0.5  
7     4     0     1 0.5  
8     4     1     1 0.5  

I hope I got what you have in mind right:我希望我得到你的想法是正确的:

coco %>%
  add_count(type) %>%
  arrange(type) %>%
  group_by(genre, type) %>%
  mutate(avg = n() / n)

# A tibble: 10 x 4
# Groups:   genre, type [8]
    type genre     n   avg
   <dbl> <dbl> <int> <dbl>
 1     1     0     3 0.667
 2     1     0     3 0.667
 3     1     1     3 0.333
 4     2     1     3 0.667
 5     2     1     3 0.667
 6     2     0     3 0.333
 7     3     1     2 0.5  
 8     3     0     2 0.5  
 9     4     1     2 0.5  
10     4     0     2 0.5 

You need to divide n with the count sum(n) to get you desired results because the data was not grouped by type only您需要将 n 除以 count sum(n) 以获得所需的结果,因为数据不是仅按类型分组的

kind check my code请检查我的代码

coco %>% group_by(type,genre) %>%
  summarise(n=n(), ) %>%
  mutate(ratio = n/sum(n))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM