[英]split data into groups in R
My data frame looks like this: 我的数据框如下所示:
plant distance
one 0
one 1
one 2
one 3
one 4
one 5
one 6
one 7
one 8
one 9
one 9.9
two 0
two 1
two 2
two 3
two 4
two 5
two 6
two 7
two 8
two 9
two 9.5
I want to split distance of each level into groups by interval(for instance,interval=3), and compute percentage of each group. 我想按时间间隔(例如,interval = 3)将每个级别的距离分成几组,然后计算每组的百分比。 Finally, plot the percentages of each level of each group similar like this: 最后,绘制各组每个级别的百分比,如下所示:
my code: 我的代码:
library(ggplot2)
library(dplyr)
dat <- data %>%
mutate(group = factor(cut(distance, seq(0, max(distance), 3), F))) %>%
group_by(plant, group) %>%
summarise(percentage = n()) %>%
mutate(percentage = percentage / sum(percentage))
p <- ggplot(dat, aes(x = plant, y = percentage, fill = group)) +
geom_bar(stat = "identity", position = "stack")+
scale_y_continuous(labels=percent)
p
But my plot is shown below: the group 4
was missing. 但是我的图如下所示:第group 4
组丢失了。
And I found that the dat
was wrong, the group 4
was NA
. 我发现dat
是错的,第group 4
是NA
。
The likely reason is that the length of group 4
was less than the interval=3
, so my question is how to fix it? 可能的原因是第group 4
的长度小于interval=3
,所以我的问题是如何解决? Thank you in advance! 先感谢您!
I have solved the problem.The reason is that the cut(distance, seq(0, max(distance), 3), F)
did not include the maximum and minimum values. 我已经解决了这个问题,原因是cut(distance, seq(0, max(distance), 3), F)
不包括最大值和最小值。
Here is my solution: 这是我的解决方案:
dat <- my_data %>%
mutate(group = factor(cut(distance, seq(from = min(distance), by = 3, length.out = n()/ 3 + 1), include.lowest = TRUE))) %>%
count(plant, group) %>%
group_by(plant) %>%
mutate(percentage = n / sum(n))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.