簡體   English   中英

將函數應用於R中數據框的列中的每個單元格

[英]apply a function to each cell in a column of a dataframe in R

編輯感謝@ user5249203指出,最好通過ggmaps的地理編碼調用來完成地理編碼。 提防NA。

我在R的apply家庭中掙扎。

我正在使用一個接收字符串並返回經度和緯度的函數

> gGeoCode("Philadelphia, PA") [1] 39.95258 -75.16522

我有一個簡單的數據框,其中包含所有52個狀態的名稱:

dput(state_lat_long)
structure(
  list(State = structure(
    c(
      32L, 28L, 43L, 5L, 23L, 34L,
      30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
      18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
      17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
      19L, 41L, 50L, 2L, 45L
    ), .Label = c(
      "alabama", "alaska", "arizona",
      "arkansas", "california", "colorado", "connecticut", "delaware",
      "florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
      "iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
      "massachusetts", "michigan", "minnesota", "mississippi", "missouri",
      "montana", "nebraska", "nevada", "new hampshire", "new jersey",
      "new mexico", "new york", "north carolina", "north dakota", "ohio",
      "oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
      "south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
      "washington", "west virginia", "wisconsin", "wyoming"
    ), class = "factor"
  )), .Names = "State", row.names = c(NA,-50L), class = "data.frame"
)

為了練習我的apply技巧,我只想將gGeoCode應用於state_lat_long數據幀的唯一列中的每個單元格。

再簡單不過了。

那這是什么問題呢?

> View(apply(state_lat_long, function(x) gGeoCode(x)))

運行此命令時,我得到:

Error in View : argument "FUN" is missing, with no default  

我不明白,因為FUN並不缺少。

因此,讓我們嘗試sapply 它應該很簡單,對吧?

但是,這有什么問題呢?

View(sapply(state_lat_long$State, function(x) gGeoCode(x)))

當我運行它時,我得到2行,每行50列,並擠滿了NA 我說不通

接下來,我嘗試了

View(apply(state_lat_long, 2, function(x) gGeoCode(x)))  

我得到了

     State
  40.71278
 -74.00594  

同樣,這沒有任何意義!

我究竟做錯了什么? 謝謝。

這是您的數據幀嗎?

df = data.frame(State = c(
    32L, 28L, 43L, 5L, 23L, 34L,
    30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
    18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
    17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
    19L, 41L, 50L, 2L, 45L
  ), Label = c(
    "alabama", "alaska", "arizona",
    "arkansas", "california", "colorado", "connecticut", "delaware",
    "florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
    "iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
    "massachusetts", "michigan", "minnesota", "mississippi", "missouri",
    "montana", "nebraska", "nevada", "new hampshire", "new jersey",
    "new mexico", "new york", "north carolina", "north dakota", "ohio",
    "oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
    "south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
    "washington", "west virginia", "wisconsin", "wyoming"
  ))

head(df)
  State      Label
1    32    alabama
2    28     alaska
3    43    arizona
4     5   arkansas
5    23 california
6    34   colorado

apply(df, 1, function(x) gGeoCode(x))

或者,

mapply(FUN = gGeoCode, df$Label, SIMPLIFY = T)

注意:某些州仍會拋出NA 重新運行代碼將獲取缺少的坐標。 但是,如果我們知道您的輸入格式/數據框構造,我希望它可以更有效地工作。 同樣,重要的是要確保傳遞的參數是gGeoCode期望的參數。

我意識到這個問題主要是關於*apply ,但是,如果只是在進行ggmap::geocode之后,一個簡單的選擇就是使用向量化函數,例如ggmap::geocode

state_lat_long <- structure(
    list(State = structure(
    c(
      32L, 28L, 43L, 5L, 23L, 34L,
      30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
      18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
      17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
      19L, 41L, 50L, 2L, 45L
    ), .Label = c(
      "alabama", "alaska", "arizona",
      "arkansas", "california", "colorado", "connecticut", "delaware",
      "florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
      "iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
      "massachusetts", "michigan", "minnesota", "mississippi", "missouri",
      "montana", "nebraska", "nevada", "new hampshire", "new jersey",
      "new mexico", "new york", "north carolina", "north dakota", "ohio",
      "oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
      "south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
      "washington", "west virginia", "wisconsin", "wyoming"
    ), class = "factor"
  )), .Names = "State", row.names = c(NA,-50L), class = "data.frame"
)

library(ggmap)

## to make sure we're using the correct geocode function I call it with 'ggmap::geocode'
ggmap::geocode(as.character(state_lat_long$State))
...
#           lon      lat
# 1   -74.00594 40.71278
# 2  -116.41939 38.80261
# 3   -99.90181 31.96860
# 4  -119.41793 36.77826
# 5   -94.68590 46.72955
# 6  -101.00201 47.55149

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM