R：使用 mutate() 逐行應用自定義函數

Question

我創建了一個函數，該函數使用sf包中的st_join()從一組緯度和經度坐標中提取國會區（多邊形），使用不同的 shapefile 來識別國會區，具體取決於“國會”參數指定的。（這是必要的，因為地區會定期重繪，因此邊界會隨着時間而變化。）下一步是將函數逐行應用於包含多行坐標（以及相關的“國會”值）的數據框，以便國會給定行的值確定要使用的 shapefile，然后將提取的區域分配給新變量。

我在逐行應用此功能時遇到了麻煩。 我首先嘗試使用 dplyr 中的dplyr rowwise()和mutate()函數，但得到了“必須為大小 1”的錯誤。 根據對這個問題的評論，我將list()放在mutate()函數內分配的變量周圍，但這導致新變量是一個列表而不是單個字符串。

我將非常感謝幫助找出一種方法來 (i) 修改函數，以便可以使用rowwise()和mutate()逐行應用它，或者 (ii) 以其他方式逐行應用我的函數.

可重現的代碼如下； 您只需要從https://cdmaps.polisci.ucla.edu/下載兩個 shapefile（“districts104.zip”和“districts111.zip”），解壓縮它們，然后將它們放在您的工作目錄中。

library(tidyverse)
library(sf)

districts_104 <- st_read("districts104.shp")
districts_111 <- st_read("districts111.shp")

congress <- c(104, 111)
latitude <- c(37.32935, 37.32935)
longitude <- c(-122.00954, -122.00954)
df_test <- data.frame(congress, latitude, longitude)

point_geo_test <- st_as_sf(df_test,
                             coords = c(x = "longitude", y = "latitude"),
                             crs = st_crs(districts_104)) # prep for st_join()

sf_use_s2(FALSE) # preempt evaluation error that would otherwise pop up when using the st_join function

extract_district <- function(points, cong) {
  shapefile <- get(paste0("districts_", cong))
  st_join_results <- st_join(points, shapefile, join = st_within)
  paste(st_join_results$STATENAME, st_join_results$DISTRICT, sep = "-")
}

point_geo_test <- point_geo_test %>%
  rowwise %>%
  mutate(district = list(extract_district(points = point_geo_test, cong = congress)))

Answer 1

編輯 7 月 7 日：

從您的評論中，我了解到您正在尋找不同的東西，我對您的函數為什么給出多個值的假設是錯誤的。 因此，從頭開始這個新答案：

您編寫的自定義函數不適用於逐行應用程序，因為它已經一次處理所有行：

給定以下輸入：

congress <- c(104, 111, 104, 111, 104, 111)
latitude <- c(37.32935, 37.32935, 41.1134016, 41.1134016, 42.1554948, 42.1554948)
longitude <- c(-122.00954, -122.00954, 73.720356, 73.720356, -87.868850502543, -87.868850502543)

point_geo_test包含以下值：

> point_geo_test
[...]
  congress                   geometry
1      104 POINT (-122.0095 37.32935)
2      111 POINT (-122.0095 37.32935)
3      104   POINT (73.72036 41.1134)
4      111   POINT (73.72036 41.1134)
5      104 POINT (-87.86885 42.15549)
6      111 POINT (-87.86885 42.15549)

並且extract_district()返回這個：

> extract_district(point_geo_test, 104)
[...]
[1] "California-14" "California-14" "NA-NA"         "NA-NA"         "Illinois-10"   "Illinois-10"

這已經是每一行的結果。 唯一的問題是，雖然它們是每行坐標的正確結果，但它們僅在第 104 屆大會期間才是這些坐標的名稱。因此，這些值僅對point_geo_test中 congress == 104 的行有效。

為所有行提取正確的值

我們將創建一個函數，該函數返回所有行的正確數據，例如相關會議期間坐標的正確名稱。

我稍微簡化了您的代碼： df_test不再是中間數據框，而是直接在point_geo_test的創建中定義。 我提取的任何值，我也會保存到這個數據框中。

library(tidyverse)
library(sf)
sf_use_s2(FALSE)

districts_104 <- st_read("districts104.shp")
districts_111 <- st_read("districts111.shp")

congress <- c(104, 111, 104, 111, 104, 111)
latitude <- c(37.32935, 37.32935, 41.1134016, 41.1134016, 42.1554948, 42.1554948)
longitude <- c(-122.00954, -122.00954, 73.720356, 73.720356, -87.868850502543, -87.868850502543)

point_geo_test <- st_as_sf(data.frame(congress, latitude, longitude),
                           coords = c(x = "longitude", y = "latitude"),
                           crs = st_crs(districts_104))

為了使代碼更加靈活和有條理，我將創建一個通用函數，它可以獲取給定坐標的任何參數：

extract_values <- function(points, parameter) {
  # initialize return values, one for each row in `points`
  values <- rep(NA, nrow(points))
  
  # for each congress present in `points`, lookup parameter and store in the rows with matching congress
  for(cong in unique(points$congress)) {
    shapefile <- get(paste0("districts_", cong))
    st_join_results <- st_join(points, shapefile, join = st_within)
    values[points$congress == cong] <- st_join_results[[parameter]][points$congress == cong]
  }
  
  return(values)
}

例子：

> extract_values(point_geo_test, 'STATENAME')
[1] "California" "California" NA           NA           "Illinois"   "Illinois"  
> extract_values(point_geo_test, 'DISTRICT')
[1] "14" "15" NA   NA   "10" "10"

存儲值

point_geo_test$state <- extract_values(point_geo_test, 'STATENAME')
point_geo_test$district <- extract_values(point_geo_test, 'DISTRICT')
point_geo_test$name <- paste(point_geo_test$state, point_geo_test$district, sep = "-")

結果：

> point_geo_test
Simple feature collection with 6 features and 4 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -122.0095 ymin: 37.32935 xmax: 73.72036 ymax: 42.15549
Geodetic CRS:  GRS 1980(IUGG, 1980)
  congress      state district          name                   geometry
1      104 California       14 California-14 POINT (-122.0095 37.32935)
2      111 California       15 California-15 POINT (-122.0095 37.32935)
3      104       <NA>     <NA>         NA-NA   POINT (73.72036 41.1134)
4      111       <NA>     <NA>         NA-NA   POINT (73.72036 41.1134)
5      104   Illinois       10   Illinois-10 POINT (-87.86885 42.15549)
6      111   Illinois       10   Illinois-10 POINT (-87.86885 42.15549)

R：使用 mutate() 逐行應用自定義函數

問題描述

1 個解決方案

解決方案1
2 已采納 2022-07-03 13:24:17

編輯 7 月 7 日：

為所有行提取正確的值

存儲值

R：使用 mutate() 逐行應用自定義函數

問題描述

1 個解決方案

解決方案1 2 已采納 2022-07-03 13:24:17

編輯 7 月 7 日：

為所有行提取正確的值

存儲值

解決方案1
2 已采納 2022-07-03 13:24:17