简体   繁体   中英

How to add polygons to your data for a voronoi treemap in R?

I have a data frame that looks like this. It contains the sunflower seed productivity of each country. I want to add next to this data polygon data so I can plot it with ggplot2.

I was told to use this site: https://observablehq.com/@ladataviz/wip-voronoi-data-generator , I want to understand how I can create polygons and plot a circular voronoi diagram.

I have created a similar post in the past, but my question here is very different. I want to find a way to create the polygon data

df <- data.frame(country = c("Ukraine", "Russia", "Argentina", "China", "Romania", "Other"),
                 prod = c(11.0, 10.6, 3.1, 2.4, 2.1, 15.3))
df
#>     country prod
#> 1   Ukraine 11.0
#> 2    Russia 10.6
#> 3 Argentina  3.1
#> 4     China  2.4
#> 5   Romania  2.1
#> 6     Other 15.3

Created on 2023-01-20 with reprex v2.0.2

If add polygons to my data should look like this:

       x            y path   split   group value
1   472.0117 220.08122253    0 Ukraine Ukraine    11
2   471.8336 217.18476868    1 Ukraine Ukraine    11
3   471.6556 214.28833008    2 Ukraine Ukraine    11
4   471.4776 211.39187622    3 Ukraine Ukraine    11
5   471.2996 208.49542236    4 Ukraine Ukraine    11
6   471.1216 205.59896851    5 Ukraine Ukraine    11

I want my data to look like this.

在此处输入图像描述

There probably is a smart algorithm for this but here is how you could make such a diagram by brute force.

Your data

df <- data.frame(country = c("Ukraine", "Russia", "Argentina", "China", "Romania", "Other"),
                  prod = c(11.0, 10.6, 3.1, 2.4, 2.1, 15.3))

A function that finds a solution through optimization

library(terra)

vtreeMap <- function(d) {

    p <- vect(cbind(0,0), crs="+proj=utm +zone=1") |> buffer(1)
    A <- expanse(p) * d / sum(d)

    f <- function(xy) {
        if (any(xy > 1) || any(xy < -1)) return(Inf)
        xy <- vect(matrix(xy, ncol=2), crs=crs(p))
        e <- extract(p, xy)
        if (any(is.na(e[,2]))) return(Inf)
        v <- crop(voronoi(xy, bnd=p), p)
        mean( (A - expanse(v))^2 )
    }

    xy <- spatSample(p, length(A)) |> crds() |> as.vector()
    opt <- optim(xy, f)
    print(paste("MSE:", round(opt$value, 5)))
    vp <- vect(matrix(opt$par, ncol=2))
    crop(voronoi(vp, bnd=p), p)
}

Call the function

set.seed(3)
vp <- vtreeMap(df$prod)
[1] "MSE: 0.01187"

And plot it

library(RColorBrewer)
vp$country <- df$country
plot(vp, col=brewer.pal(6, "Set2"), axes=FALSE, lwd=4, border="white", mar=rep(0.1, 4))
text(vp, "country", halo=TRUE)

在此处输入图像描述

You may need to tweak the optimization procedure (different algorithm, additional options) a bit for the best result (low MSE).

For example, you may use

 opt <- optim(xy, f, method="BFGS", control=list(abstol=0.001, maxit=500))

If you do not like this particular solution, change the seed and try again until you find one that pleases you.

If you want to use ggplot2 you can do

library(tidyterra)
library(ggplot2)
ggplot(vp) + geom_spatvector(aes(fill = country)) + theme_void()

It's relatively easy to make a Voronoi tasselation in R, but it's harder to make a Voronoi treemap . The linked Q&A does it by using the voronoiTreemap package, which is essentially just a wrapper round a JavaScript library. As far as I can tell, this is the only published R package that generates Voronoi treemaps.

Our two options are to calculate the polygons ourselves from scratch, or somehow extract the polygons from the SVG output of voronoiTreemap .

With regards to the first option, this is not a trivial problem. To see just how complex it is, and also to get a fully worked solution in R, you can check out this fantastic article by Paul Murrell . The code runs to several pages and is over a decade old, so I'm not sure if all the dependencies still work. It's disappointing that no-one has put it all together in a package on CRAN, but perhaps it's a bit niche.

If you struggle with Paul Murrell's approach, you are left trying to harvest polygons from the output of voronoiTreemap . Although this package works well, the output does not lend itself to being harvested for polygons, and we do not get access to the intermediate calculations that would allow us to generate the polygons ourselves in R. It's not impossible, and there are a few ways to tackle it, but they are all fairly convoluted.

The following approach starts with plotting your treemap as normal with voronoiTreemap , but without the labels:

library(voronoiTreemap)
library(terra)
library(tidyverse)

df <- data.frame(country = c("Ukraine", "Russia", "Argentina", 
                             "China", "Romania", "Other"),
                 prod = c(11.0, 10.6, 3.1, 2.4, 2.1, 15.3))

vor <- data.frame(h1 = 'World', 
                  h2 = c('Europe', 'Europe', 'Americas', 'Asia',
                         'Europe', 'Other'),
                  h3 = df$country,
                  color = hcl.colors(nrow(df), palette = 'TealRose'),
                  weight = df$prod,
                  codes = "")

vt <- vt_input_from_df(vor)

v <- vt_d3(vt_export_json(vt))

v

在此处输入图像描述

Now click on Export -> Save as image and save your plot as Rplot.png

Now we can do

polygons <- rast('Rplot02.png')[[2]] %>% 
  app(fun = function(x) ifelse(x > 220, 255, 0)) %>%
  as.polygons() %>%
  sf::st_as_sf() %>% 
  filter(lyr.1 == 0) %>%
  sf::st_buffer(dist = -0.002) %>%
  sf::st_coordinates() %>%
  as.data.frame() %>%
  mutate(country = df$country[L2], prod = df$prod[L2]) %>%
  select(-(L1:L3))

Resulting in the following data frame with our polygons:

head(polygons)
#>           X         Y country prod
#> 1 0.6460000 0.3970068 Ukraine   11
#> 2 0.6460000 0.4054322 Ukraine   11
#> 3 0.6460501 0.4054499 Ukraine   11
#> 4 0.6461468 0.4054900 Ukraine   11
#> 5 0.6462413 0.4055351 Ukraine   11

And we can see that this is a data frame of polygons of the Voronoi treemap by doing:

ggplot(polygons, aes(X, Y, fill = country)) + 
  geom_polygon() +
  coord_fixed(0.52) +
  theme_void()

在此处输入图像描述

Having become hooked by this problem, I have written a small package to address it which you can install via

devtools::install_github("AllanCameron/VoronoiPlus")

It can deal with Voronoi maps (as in the question on this page) with a call to voronoi_map , which takes the weights and group labels. It can also take an arbitrary shape to act as the boundary of the tiles, though defaults to the unit circle if this is missing.

library(VoronoiPlus)

res <- voronoi_map(values = df$prod, groups = df$country)

plot(res)

在此处输入图像描述

You can extract the polygons as a data frame from this object with:

polys <- get_polygons(res)

head(polys)
#>   geom          x          y   group value
#> 1    1 -0.6436006 -0.7649495 Ukraine    11
#> 2    1 -0.6691306 -0.7431448 Ukraine    11
#> 3    1 -0.7071068 -0.7071068 Ukraine    11
#> 4    1 -0.7431448 -0.6691306 Ukraine    11
#> 5    1 -0.7771460 -0.6293204 Ukraine    11
#> 6    1 -0.8090170 -0.5877853 Ukraine    11

The package can also handle arbitrarily nested groups to produce a genuine treemap via the voronoi_treemap function, which employs a formula interface (the weights on the left and grouping variables on the right)

df$region <- c("Europe", "Europe", "Other", "Other", "Europe", "Other")

dat <- voronoi_treemap(prod ~ region + country, data = df)

head(dat)
#>           x          y  group value parent level
#> 1 0.6657716 -0.7460137 Europe  23.7   root     1
#> 2 0.6293204 -0.7771460 Europe  23.7   root     1
#> 3 0.5877853 -0.8090170 Europe  23.7   root     1
#> 4 0.5446390 -0.8386706 Europe  23.7   root     1
#> 5 0.5000000 -0.8660254 Europe  23.7   root     1
#> 6 0.4539905 -0.8910065 Europe  23.7   root     1

This allows a nested treemap as follows:

library(tidyverse)

ggplot(dat[dat$level == 2,], aes(x, y, label = group)) +
  geom_polygon(aes(fill = parent)) +
  geom_polygon(fill = "white", aes(group = group, alpha = group), 
               color = "black") +
  geom_text(data = . %>% group_by(group) %>% 
              summarize(x = mean(x), y = mean(y))) +
  scale_alpha_discrete(guide = "none") +
  coord_equal() +
  theme_void()

在此处输入图像描述

The algorithm used is also a brute-force method at present, though a little different to the one demonstrated by Robert Hijmans. I am working on a more directed method to improve convergence times.

A major caveat is that the package is in a nascent stage and has not been properly tested or documented at the time of writing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM