简体   繁体   English

计算R中每组的凸包

[英]Calculating convex hull for each group in R

I have a following data set: 我有以下数据集:

structure(list(time = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L), 
x = c(40.8914337158203, 20.0796813964844, 13.9093618392944, 
17.1513957977295, 18.5109558105469, 40.7868537902832, 19.9750995635986, 
13.804780960083, 16.8376483917236, 18.4063758850098, 40.6822700500488, 
19.7659358978271, 13.7001991271973, 16.6284866333008, 18.3017921447754, 
40.5776901245117, 19.66135597229, 13.5956182479858, 16.3147411346436, 
18.1972122192383, 40.5776901245117, 19.5567722320557, 13.4910354614258, 
16.1055774688721, 17.9880485534668), y = c(0.603550314903259, 
-8.24852085113525, 9.65680503845215, -19.0118350982666, 6.43787002563477, 
0.704141974449158, -8.34911251068115, 9.75739574432373, -19.2130165100098, 
6.43787002563477, 0.704141974449158, -8.44970417022705, 9.75739574432373, 
-19.5147914886475, 6.43787002563477, 0.704141974449158, -8.65088748931885, 
9.85798835754395, -19.8165683746338, 6.33727836608887, 0.704141974449158, 
-8.85207080841064, 9.85798835754395, -20.1183433532715, 6.33727836608887
), object = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -25L), .Names = c("time", 
"x", "y", "object"))

Now, I would like to calculate a convex hull (using chull function) for each value of time and store it within the same dataset (as I would like to make a plot with ggplot2 then). 现在,我想为每个time值计算一个凸包(使用chull函数)并将其存储在同一个数据集中(因为我想用ggplot2制作一个图)。 I can use chull for each time value using with 我可以使用chull为每个时间值使用

chull(filter(data_sample, time == 1)$x, filter(data_sample, time == 1)$y)

which returns a vector of 4 3 1 . 返回4 3 1的向量。 So I thought that I can group by time firstly and calculate convex hull points within groups with something like 所以我认为我可以先按时间分组并用类似的东西计算组内的凸包点

data_sample %>% group_by(time) %>% summarise(pts = chull(data_sample$x, data_sample$y))

The problem is that I cannot store a vector in a row. 问题是我无法将矢量存储在一行中。 Storing each of vertices in separate column would be an option, but the following 将每个顶点存储在单独的列中是一个选项,但以下内容

data_sample %>% group_by(time) %>% summarise(pt1 = chull(data_sample$x, data_sample$y)[1])

doesn't give reasonable results. 没有给出合理的结果。 So my questions are: 1. How can I store a vector for each row within one column? 所以我的问题是:1。如何在一列中存储每行的向量? I have read that tibbles can actually have a list column, but how can I create that in my case? 我已经读过,这些元组实际上可以有一个列表列,但是我怎么能在我的情况下创建它呢? 2. What's wrong with my attempt to calculate chull within each group? 2.我尝试计算chull内的chull有什么问题?

  • (extra question, if I may) Why actually data_sample %>% filter(time == 1) %>% chull(.$x, .$y) doesn't work? (额外的问题,如果可以的话)为什么实际上data_sample %>% filter(time == 1) %>% chull(.$x, .$y)不起作用? Is this because chull is not design to work with pipes and dplyr ? 这是因为chull不适用于管道和dplyr吗?

Since chull is giving you indices on the original data, you probably want to preserve the coordinates as you go, which means you probably should not be using summarize . 由于chull为您提供原始数据的索引,您可能希望保留坐标,这意味着您可能不应该使用summarize I suggest you go with the "nested" concept as done with tidyr . 我建议你选择与tidyr一样的“嵌套”概念。 The first step is nesting your data: 第一步是嵌套数据:

library(tidyr)
data_sample %>%
  group_by(time) %>%
  nest()
# # A tibble: 5 × 2
#    time             data
#   <int>           <list>
# 1     1 <tibble [5 × 3]>
# 2     2 <tibble [5 × 3]>
# 3     3 <tibble [5 × 3]>
# 4     4 <tibble [5 × 3]>
# 5     5 <tibble [5 × 3]>

From here, it's just a matter of calculating the hull (which will return a vector of indices) and then output the relevant rows, in the order provided. 从这里开始,只需计算船体(将返回索引向量),然后按照提供的顺序输出相关行。 This will benefit from the map functions provided by purrr : 这将受益于purrr提供的map功能:

library(purrr)
data_sample %>%    data_sample %>%
  group_by(time) %>%
  nest() %>%
  mutate(
    hull = map(data, ~ with(.x, chull(x, y))),
    out = map2(data, hull, ~ .x[.y,,drop=FALSE])
  )
# # A tibble: 5 × 4
#    time             data      hull              out
#   <int>           <list>    <list>           <list>
# 1     1 <tibble [5 × 3]> <int [3]> <tibble [3 × 3]>
# 2     2 <tibble [5 × 3]> <int [3]> <tibble [3 × 3]>
# 3     3 <tibble [5 × 3]> <int [3]> <tibble [3 × 3]>
# 4     4 <tibble [5 × 3]> <int [3]> <tibble [3 × 3]>
# 5     5 <tibble [5 × 3]> <int [3]> <tibble [3 × 3]>

(You should be able to get away with putting both assignments into a single mutate . I (你应该能够将两个任务分配到一个mutate 。我

From here, you can turn it into the coordinates you need by removing now-unnecessary columns and unnesting: 从这里,您可以通过删除现在不必要的列并删除它来将其转换为您需要的坐标:

data_sample %>%
  group_by(time) %>%
  nest() %>%
  mutate(
    hull = map(data, ~ with(.x, chull(x, y))),
    out = map2(data, hull, ~ .x[.y,,drop=FALSE])
  ) %>%
  select(-data) %>%
  unnest()
# # A tibble: 15 × 5
#     time  hull        x           y object
#    <int> <int>    <dbl>       <dbl>  <int>
# 1      1     4 17.15140 -19.0118351      4
# 2      1     3 13.90936   9.6568050      3
# 3      1     1 40.89143   0.6035503      1
# 4      2     4 16.83765 -19.2130165      4
# 5      2     3 13.80478   9.7573957      3
# 6      2     1 40.78685   0.7041420      1
# 7      3     4 16.62849 -19.5147915      4
# 8      3     3 13.70020   9.7573957      3
# 9      3     1 40.68227   0.7041420      1
# 10     4     4 16.31474 -19.8165684      4
# 11     4     3 13.59562   9.8579884      3
# 12     4     1 40.57769   0.7041420      1
# 13     5     4 16.10558 -20.1183434      4
# 14     5     3 13.49104   9.8579884      3
# 15     5     1 40.57769   0.7041420      1

(I kept hull here for demonstration purposes; you probably can select(-data, -hull) above since you'll have what you need, especially if redundant with object .) (为了演示目的,我在这里保留hull ;你可能可以select(-data, -hull) ,因为你将拥有所需的东西,特别是如果object多余的话。)

For your last question, you could have done either one of these: 对于你的上一个问题,你可以做其中任何一个:

filter(data_sample, time == 1) %>%
  with(., chull(x, y))
with(filter(data_sample, time == 1), chull(x, y))

You can simply pass chull function inside a list: 您可以简单地在列表中传递chull函数:

df <- df %>% 
  group_by(time) %>% 
  mutate(chull_val = list(chull(x,y)))

If you don't want to work with list columns*, you may consider using (the more flexible) data.table . 如果您不想使用列表列*,您可以考虑使用(更灵活的) data.table

library(data.table)
setDT(d)
d[d[ , .I[chull(x, y)], by = time]$V1]

Explanation: convert your data to a data.table ( setDT(d) ). 说明:将数据转换为data.tablesetDT(d) )。 For each time ( by = time ), calculate the chull indices and select the corresponding rows ( .I ) (see here ). 对于每次( by = time ),计算chull索引并选择相应的行( .I )(参见此处 )。


If you want to plot the chull polygons, you need to add the first index to close the polygon. 如果要绘制chull多边形,则需要添加第一个索引以关闭多边形。

d2 <- d[ , {

  # for each time (by = time):
  # compute the indices lying on the convex hull  
  ix <- chull(x, y)

  # use indices to select data of each subset (.SD)
  # possibly also add the first coordinate to close the polygon for plotting   
  .SD[c(ix, ix[1])]}, by = time]


# plot chull and original polygons
library(ggplot2) 
ggplot(d2, aes(x, y, fill = factor(time))) +
  geom_polygon(alpha = 0.2) +
  geom_polygon(data = d, alpha = 0.2)

在此输入图像描述


*Related dplyr issues: Summarising verbs with variable-length outputs , Optional parameter to control length of summarise . *相关的dplyr问题: 汇总具有可变长度输出的动词可选参数来控制汇总的长度

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM