簡體   English   中英

優化代碼以將大向量拆分為較小的文件?

[英]Optimise code to split large vector into smaller files?

在我下面的玩具/示例代碼中,我制作了一個覆蓋世界的網格,然后使用此網格將一個大型復雜的全球數據集拆分為每個網格單元格的一個文件。 在我的實際工作中,這是一個瓶頸,需要很長時間。 我會很感激一些關於優化它的想法和想法。 我通過利用並行處理取得了一些成功,但我也認為它可以做得“更聰明”。

library("sf")
library("terra")
library("glue")
library("rnaturalearth")
library("tidyverse")

dir.create("tmp")

ogr2ogr_path <- "C://Program Files//QGIS 3.26.1//bin//ogr2ogr.exe"

## Make some grid cells
world_grid <-  rast(nrows=1, ncols=1, xmin=-180, xmax=180, ymin=-90, ymax=90, crs="epsg:4326") %>%
                st_bbox() %>%
                st_as_sfc() %>%
                st_make_grid(cellsize = 10) %>%
                st_as_sf()

make_grid_cells <- function(grid_id, world_grid) {
    output_name <- glue("tmp/polygon_{grid_id}.gpkg")
    st_write(world_grid[grid_id,],
            output_name,
            append = FALSE,
            quiet = TRUE)
    return(output_name)
}

grid_cell <- lapply(1:nrow(world_grid), make_grid_cells, world_grid = world_grid)

## Get some sample data
ne_countries(type = "countries", scale = "large", returnclass = "sf") %>% 
            select(iso_a2) %>%
            st_write("tmp/world_polygons.gpkg")

## Split the worldwide data into tiles
split_world_to_tiles <- function(tile_template_area, worldwide_data) {
    output_name <- gsub("polygon", "worldwide_poly", tile_template_area)
    grid_poly <- st_read(tile_template_area, quiet=T)
    box <- grid_poly %>% st_bbox()
    command <- glue('{double_quote("C://Program Files//QGIS 3.26.1//bin//ogr2ogr.exe")} -spat {box$xmin} {box$ymin} {box$xmax} {box$ymax} -clipsrc spat_extent -f GPKG {output_name} {worldwide_data} -nlt GEOMETRYCOLLECTION')
    system(command)
}

split_worldwide_data <- lapply(grid_cell, split_world_to_tiles, worldwide_data = "tmp/world_polygons.gpkg")

如果您使用不同的文件格式,您似乎可以獲得更多的里程。

library(terra)
dir.create("tmp", FALSE, FALSE)
d <- file.remove(list.files("tmp", full=TRUE))

wrldgrid <- as.polygons(rast(res=10))

write_cells <- function(wgrid, format=".gpkg") {
    nr <- nrow(wgrid)
    outf <- paste0("tmp/polygon_", 1:nr, format)
    for (i in 1:nr) {
        writeVector(wgrid[i,], outf[i])
    }
    invisible(outf)
}

system.time(f <- write_cells(wrldgrid))
#   user  system elapsed 
#   5.03   13.89   24.70 

system.time(f <- write_cells(wrldgrid, ".shp"))
#   user  system elapsed 
#   1.97    3.15    9.86 

如果你要在R中使用這些文件,不妨將它們保存到“.rds”。

write_rds <- function(wgrid) {
    nr <- nrow(wgrid)
    outf <- paste0("tmp/polygon_", 1:nr, ".rds")
    for (i in 1:nr) {
        saveRDS(wgrid[i,], outf[i])
    }
    invisible(outf)
}

system.time(f <- write_rds(wrldgrid))
#   user  system elapsed 
#   1.71    0.40    2.31 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM