使用 dplyr distinct 忽略 R 中 sf object 的几何形状

Question

I have a dataset with multiple polygons in different locations that share the same attributes.我有一个数据集，在不同位置有多个多边形，它们共享相同的属性。 I only want one polygon in my dateset for each set of unique attributes (so in my example below, that would be Area and Zone ) - I don't care about where they are so I want to ignore the geometry attribute.对于每组唯一属性，我只希望在我的日期集中有一个多边形（所以在我下面的示例中，这将是Area和Zone ）——我不关心它们在哪里，所以我想忽略几何属性。

library(sf)
library(dplyr)

    Areas <- st_as_sf(tibble(
      Area =c("Zone1", "Zone1","Zone2","Zone1"),
      Zone =c("Area27","Area27","Area42","Area27"),
      lng = c(20.1, 20.2, 20.1, 20.1),
      lat = c(-1.1, -1.2, -1.1, -1.1)),
    coords = c("lng", "lat")) %>% st_buffer(.,100)

I am using dplyr distinct to remove duplicate records, but I am finding the geometry column is being used to determine distinct records, even though I believe this should be ignoring the geometry column:我正在使用 dplyr distinct 来删除重复记录，但我发现几何列被用来确定不同的记录，尽管我认为这应该忽略几何列：

Areas %>% distinct(across(-geometry),.keep_all=TRUE)

However, it is returns two results for Zone1 and Area27 when the geometry is different.但是，当几何形状不同时，它会为 Zone1 和 Area27 返回两个结果。 Is this expected behaviour or am I do something wrong?这是预期的行为还是我做错了什么？

My required output would only have two rows in it, one for Zone1 & Area27 and another for Zone2 & Area42 with the geometry for those rows retained ie something similar to what happens you run the same code on a normal tibble:我所需的 output 中只有两行，一行用于 Zone1 和 Area27，另一行用于 Zone2 和 Area42，并保留这些行的几何形状，即类似于您在普通 tibble 上运行相同代码时发生的情况：

Table <- tibble(
  Area =c("Zone1", "Zone1","Zone2","Zone1"),
  Zone =c("Area27","Area27","Area42","Area27"),
  lng = c(20.1, 20.2, 20.1, 20.1),
  lat = c(-1.1, -1.2, -1.1, -1.1))

Table %>% distinct(across(c(-lng,-lat)),.keep_all=TRUE)

Answer 1

I found an alternative method:我找到了另一种方法：

Areas %>% group_by(Area,Zone) %>% 
          mutate(id = row_number()) %>% 
          filter(id == 1) %>% 
          select(-id)

If you are dealing with a dataset with a lot of polygons this is likely to be faster than @Waldi's answer (at least it was for me).如果您正在处理包含大量多边形的数据集，这可能比@Waldi 的回答更快（至少对我来说是这样）。

Answer 2

You could summarize:你可以总结：

Areas  %>% group_by(Area,Zone) %>% summarize()

# A tibble: 2 x 3
# Groups:   Area [2]
  Area  Zone                                                                          geometry
  <chr> <chr>                                                                        <POLYGON>
1 Zone1 Area27 ((120.2 -1.2, 120.063 -6.433596, 119.6522 -11.65285, 118.9688 -16.84345, 118.0~
2 Zone2 Area42 ((120.1 -1.1, 119.963 -6.333596, 119.5522 -11.55285, 118.8688 -16.74345, 117.9~

使用 dplyr distinct 忽略 R 中 sf object 的几何形状

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-09-02 13:20:20

解决方案2
0 2020-09-01 21:35:16

使用 dplyr distinct 忽略 R 中 sf object 的几何形状

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-09-02 13:20:20

解决方案2 0 2020-09-01 21:35:16

解决方案1
2 已采纳 2020-09-02 13:20:20

解决方案2
0 2020-09-01 21:35:16