简体   繁体   English

将数据分为R组

[英]split data into groups in R

My data frame looks like this: 我的数据框如下所示:

plant   distance
one 0
one 1
one 2
one 3
one 4
one 5
one 6
one 7
one 8
one 9
one 9.9
two 0
two 1
two 2
two 3
two 4
two 5
two 6
two 7
two 8
two 9
two 9.5

I want to split distance of each level into groups by interval(for instance,interval=3), and compute percentage of each group. 我想按时间间隔(例如,interval = 3)将每个级别的距离分成几组,然后计算每组的百分比。 Finally, plot the percentages of each level of each group similar like this: 最后,绘制各组每个级别的百分比,如下所示:

在此处输入图片说明

my code: 我的代码:

library(ggplot2)
library(dplyr)

dat <- data %>% 
  mutate(group = factor(cut(distance, seq(0, max(distance), 3), F))) %>% 
  group_by(plant, group) %>% 
  summarise(percentage = n()) %>% 
  mutate(percentage = percentage / sum(percentage))
p <- ggplot(dat, aes(x = plant, y = percentage, fill = group)) + 
  geom_bar(stat = "identity", position = "stack")+
  scale_y_continuous(labels=percent)
p

But my plot is shown below: the group 4 was missing. 但是我的图如下所示:第group 4组丢失了。 在此处输入图片说明

And I found that the dat was wrong, the group 4 was NA . 我发现dat是错的,第group 4NA

在此处输入图片说明

The likely reason is that the length of group 4 was less than the interval=3 , so my question is how to fix it? 可能的原因是第group 4的长度小于interval=3 ,所以我的问题是如何解决? Thank you in advance! 先感谢您!

I have solved the problem.The reason is that the cut(distance, seq(0, max(distance), 3), F) did not include the maximum and minimum values. 我已经解决了这个问题,原因是cut(distance, seq(0, max(distance), 3), F)不包括最大值和最小值。

Here is my solution: 这是我的解决方案:

dat <- my_data %>% 
  mutate(group = factor(cut(distance, seq(from = min(distance), by = 3,   length.out = n()/ 3 + 1),  include.lowest = TRUE)))  %>% 
  count(plant, group) %>%
  group_by(plant) %>%
  mutate(percentage = n / sum(n))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM