[英]Calculating convex hull for each group in R
I have a following data set: 我有以下数据集:
structure(list(time = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L),
x = c(40.8914337158203, 20.0796813964844, 13.9093618392944,
17.1513957977295, 18.5109558105469, 40.7868537902832, 19.9750995635986,
13.804780960083, 16.8376483917236, 18.4063758850098, 40.6822700500488,
19.7659358978271, 13.7001991271973, 16.6284866333008, 18.3017921447754,
40.5776901245117, 19.66135597229, 13.5956182479858, 16.3147411346436,
18.1972122192383, 40.5776901245117, 19.5567722320557, 13.4910354614258,
16.1055774688721, 17.9880485534668), y = c(0.603550314903259,
-8.24852085113525, 9.65680503845215, -19.0118350982666, 6.43787002563477,
0.704141974449158, -8.34911251068115, 9.75739574432373, -19.2130165100098,
6.43787002563477, 0.704141974449158, -8.44970417022705, 9.75739574432373,
-19.5147914886475, 6.43787002563477, 0.704141974449158, -8.65088748931885,
9.85798835754395, -19.8165683746338, 6.33727836608887, 0.704141974449158,
-8.85207080841064, 9.85798835754395, -20.1183433532715, 6.33727836608887
), object = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -25L), .Names = c("time",
"x", "y", "object"))
Now, I would like to calculate a convex hull (using chull
function) for each value of time
and store it within the same dataset (as I would like to make a plot with ggplot2
then). 现在,我想为每个
time
值计算一个凸包(使用chull
函数)并将其存储在同一个数据集中(因为我想用ggplot2
制作一个图)。 I can use chull
for each time value using with 我可以使用
chull
为每个时间值使用
chull(filter(data_sample, time == 1)$x, filter(data_sample, time == 1)$y)
which returns a vector of 4 3 1
. 返回
4 3 1
的向量。 So I thought that I can group by time firstly and calculate convex hull points within groups with something like 所以我认为我可以先按时间分组并用类似的东西计算组内的凸包点
data_sample %>% group_by(time) %>% summarise(pts = chull(data_sample$x, data_sample$y))
The problem is that I cannot store a vector in a row. 问题是我无法将矢量存储在一行中。 Storing each of vertices in separate column would be an option, but the following
将每个顶点存储在单独的列中是一个选项,但以下内容
data_sample %>% group_by(time) %>% summarise(pt1 = chull(data_sample$x, data_sample$y)[1])
doesn't give reasonable results. 没有给出合理的结果。 So my questions are: 1. How can I store a vector for each row within one column?
所以我的问题是:1。如何在一列中存储每行的向量? I have read that tibbles can actually have a list column, but how can I create that in my case?
我已经读过,这些元组实际上可以有一个列表列,但是我怎么能在我的情况下创建它呢? 2. What's wrong with my attempt to calculate
chull
within each group? 2.我尝试计算
chull
内的chull
有什么问题?
data_sample %>% filter(time == 1) %>% chull(.$x, .$y)
doesn't work? data_sample %>% filter(time == 1) %>% chull(.$x, .$y)
不起作用? Is this because chull
is not design to work with pipes and dplyr
? chull
不适用于管道和dplyr
吗? Since chull
is giving you indices on the original data, you probably want to preserve the coordinates as you go, which means you probably should not be using summarize
. 由于
chull
为您提供原始数据的索引,您可能希望保留坐标,这意味着您可能不应该使用summarize
。 I suggest you go with the "nested" concept as done with tidyr
. 我建议你选择与
tidyr
一样的“嵌套”概念。 The first step is nesting your data: 第一步是嵌套数据:
library(tidyr)
data_sample %>%
group_by(time) %>%
nest()
# # A tibble: 5 × 2
# time data
# <int> <list>
# 1 1 <tibble [5 × 3]>
# 2 2 <tibble [5 × 3]>
# 3 3 <tibble [5 × 3]>
# 4 4 <tibble [5 × 3]>
# 5 5 <tibble [5 × 3]>
From here, it's just a matter of calculating the hull (which will return a vector of indices) and then output the relevant rows, in the order provided. 从这里开始,只需计算船体(将返回索引向量),然后按照提供的顺序输出相关行。 This will benefit from the
map
functions provided by purrr
: 这将受益于
purrr
提供的map
功能:
library(purrr)
data_sample %>% data_sample %>%
group_by(time) %>%
nest() %>%
mutate(
hull = map(data, ~ with(.x, chull(x, y))),
out = map2(data, hull, ~ .x[.y,,drop=FALSE])
)
# # A tibble: 5 × 4
# time data hull out
# <int> <list> <list> <list>
# 1 1 <tibble [5 × 3]> <int [3]> <tibble [3 × 3]>
# 2 2 <tibble [5 × 3]> <int [3]> <tibble [3 × 3]>
# 3 3 <tibble [5 × 3]> <int [3]> <tibble [3 × 3]>
# 4 4 <tibble [5 × 3]> <int [3]> <tibble [3 × 3]>
# 5 5 <tibble [5 × 3]> <int [3]> <tibble [3 × 3]>
(You should be able to get away with putting both assignments into a single mutate
. I (你应该能够将两个任务分配到一个
mutate
。我
From here, you can turn it into the coordinates you need by removing now-unnecessary columns and unnesting: 从这里,您可以通过删除现在不必要的列并删除它来将其转换为您需要的坐标:
data_sample %>%
group_by(time) %>%
nest() %>%
mutate(
hull = map(data, ~ with(.x, chull(x, y))),
out = map2(data, hull, ~ .x[.y,,drop=FALSE])
) %>%
select(-data) %>%
unnest()
# # A tibble: 15 × 5
# time hull x y object
# <int> <int> <dbl> <dbl> <int>
# 1 1 4 17.15140 -19.0118351 4
# 2 1 3 13.90936 9.6568050 3
# 3 1 1 40.89143 0.6035503 1
# 4 2 4 16.83765 -19.2130165 4
# 5 2 3 13.80478 9.7573957 3
# 6 2 1 40.78685 0.7041420 1
# 7 3 4 16.62849 -19.5147915 4
# 8 3 3 13.70020 9.7573957 3
# 9 3 1 40.68227 0.7041420 1
# 10 4 4 16.31474 -19.8165684 4
# 11 4 3 13.59562 9.8579884 3
# 12 4 1 40.57769 0.7041420 1
# 13 5 4 16.10558 -20.1183434 4
# 14 5 3 13.49104 9.8579884 3
# 15 5 1 40.57769 0.7041420 1
(I kept hull
here for demonstration purposes; you probably can select(-data, -hull)
above since you'll have what you need, especially if redundant with object
.) (为了演示目的,我在这里保留
hull
;你可能可以select(-data, -hull)
,因为你将拥有所需的东西,特别是如果object
多余的话。)
For your last question, you could have done either one of these: 对于你的上一个问题,你可以做其中任何一个:
filter(data_sample, time == 1) %>%
with(., chull(x, y))
with(filter(data_sample, time == 1), chull(x, y))
You can simply pass chull function inside a list: 您可以简单地在列表中传递chull函数:
df <- df %>%
group_by(time) %>%
mutate(chull_val = list(chull(x,y)))
If you don't want to work with list columns*, you may consider using (the more flexible) data.table
. 如果您不想使用列表列*,您可以考虑使用(更灵活的)
data.table
。
library(data.table)
setDT(d)
d[d[ , .I[chull(x, y)], by = time]$V1]
Explanation: convert your data to a data.table
( setDT(d)
). 说明:将数据转换为
data.table
( setDT(d)
)。 For each time ( by = time
), calculate the chull
indices and select the corresponding rows ( .I
) (see here ). 对于每次(
by = time
),计算chull
索引并选择相应的行( .I
)(参见此处 )。
If you want to plot the chull
polygons, you need to add the first index to close the polygon. 如果要绘制
chull
多边形,则需要添加第一个索引以关闭多边形。
d2 <- d[ , {
# for each time (by = time):
# compute the indices lying on the convex hull
ix <- chull(x, y)
# use indices to select data of each subset (.SD)
# possibly also add the first coordinate to close the polygon for plotting
.SD[c(ix, ix[1])]}, by = time]
# plot chull and original polygons
library(ggplot2)
ggplot(d2, aes(x, y, fill = factor(time))) +
geom_polygon(alpha = 0.2) +
geom_polygon(data = d, alpha = 0.2)
*Related dplyr
issues: Summarising verbs with variable-length outputs , Optional parameter to control length of summarise . *相关的
dplyr
问题: 汇总具有可变长度输出的动词 , 可选参数来控制汇总的长度 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.