简体   繁体   English

将函数应用于R中数据框的列中的每个单元格

[英]apply a function to each cell in a column of a dataframe in R

EDIT Thanks to @user5249203 for pointing out that geocoding is best done with ggmaps' geocode call. 编辑感谢@ user5249203指出,最好通过ggmaps的地理编码调用来完成地理编码。 Watch out for NA's though. 提防NA。

I am struggling with the apply family in R. 我在R的apply家庭中挣扎。

I am using a function which takes in a string and returns longitude and latitude 我正在使用一个接收字符串并返回经度和纬度的函数

> gGeoCode("Philadelphia, PA") [1] 39.95258 -75.16522

I have a simple dataframe which has the names of all 52 states: 我有一个简单的数据框,其中包含所有52个状态的名称:

dput(state_lat_long)
structure(
  list(State = structure(
    c(
      32L, 28L, 43L, 5L, 23L, 34L,
      30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
      18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
      17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
      19L, 41L, 50L, 2L, 45L
    ), .Label = c(
      "alabama", "alaska", "arizona",
      "arkansas", "california", "colorado", "connecticut", "delaware",
      "florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
      "iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
      "massachusetts", "michigan", "minnesota", "mississippi", "missouri",
      "montana", "nebraska", "nevada", "new hampshire", "new jersey",
      "new mexico", "new york", "north carolina", "north dakota", "ohio",
      "oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
      "south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
      "washington", "west virginia", "wisconsin", "wyoming"
    ), class = "factor"
  )), .Names = "State", row.names = c(NA,-50L), class = "data.frame"
)

To practice my apply skills, I simply want to apply gGeoCode to each cell in the only column of the state_lat_long dataframe. 为了练习我的apply技巧,我只想将gGeoCode应用于state_lat_long数据帧的唯一列中的每个单元格。

Couldn't be much simpler. 再简单不过了。

Then what is the problem with this? 那这是什么问题呢?

> View(apply(state_lat_long, function(x) gGeoCode(x)))

When I run this, I get: 运行此命令时,我得到:

Error in View : argument "FUN" is missing, with no default  

which I don't understand, because FUN is not missing. 我不明白,因为FUN并不缺少。

So, let's try sapply . 因此,让我们尝试sapply It's supposed to be simple, right? 它应该很简单,对吧?

But what is wrong with this? 但是,这有什么问题呢?

View(sapply(state_lat_long$State, function(x) gGeoCode(x)))

When I run this, I get 2 rows with 50 columns, packed with NA s. 当我运行它时,我得到2行,每行50列,并挤满了NA I can't make sense of it. 我说不通

Next, I tried 接下来,我尝试了

View(apply(state_lat_long, 2, function(x) gGeoCode(x)))  

and I got 我得到了

     State
  40.71278
 -74.00594  

Again, this makes no sense! 同样,这没有任何意义!

What am I doing wrong? 我究竟做错了什么? Thanks. 谢谢。

Is this how your data frame is ? 这是您的数据帧吗?

df = data.frame(State = c(
    32L, 28L, 43L, 5L, 23L, 34L,
    30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
    18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
    17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
    19L, 41L, 50L, 2L, 45L
  ), Label = c(
    "alabama", "alaska", "arizona",
    "arkansas", "california", "colorado", "connecticut", "delaware",
    "florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
    "iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
    "massachusetts", "michigan", "minnesota", "mississippi", "missouri",
    "montana", "nebraska", "nevada", "new hampshire", "new jersey",
    "new mexico", "new york", "north carolina", "north dakota", "ohio",
    "oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
    "south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
    "washington", "west virginia", "wisconsin", "wyoming"
  ))

head(df)
  State      Label
1    32    alabama
2    28     alaska
3    43    arizona
4     5   arkansas
5    23 california
6    34   colorado

apply(df, 1, function(x) gGeoCode(x))

Alternatively, 或者,

mapply(FUN = gGeoCode, df$Label, SIMPLIFY = T)

Note: Some states still throws NA . 注意:某些州仍会抛出NA Re-run of the code fetches the missing coordinates. 重新运行代码将获取缺少的坐标。 But, I expect it to work more efficiently if we know your input format/ dataframe construction. 但是,如果我们知道您的输入格式/数据框构造,我希望它可以更有效地工作。 Also, it is important to make sure the arguments you pass are what the gGeoCode expects. 同样,重要的是要确保传递的参数是gGeoCode期望的参数。

I realise this question was primarily about *apply , but, if you were only after geocodes an easier option would be to use a vectorised function, such as ggmap::geocode 我意识到这个问题主要是关于*apply ,但是,如果只是在进行ggmap::geocode之后,一个简单的选择就是使用向量化函数,例如ggmap::geocode

state_lat_long <- structure(
    list(State = structure(
    c(
      32L, 28L, 43L, 5L, 23L, 34L,
      30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
      18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
      17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
      19L, 41L, 50L, 2L, 45L
    ), .Label = c(
      "alabama", "alaska", "arizona",
      "arkansas", "california", "colorado", "connecticut", "delaware",
      "florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
      "iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
      "massachusetts", "michigan", "minnesota", "mississippi", "missouri",
      "montana", "nebraska", "nevada", "new hampshire", "new jersey",
      "new mexico", "new york", "north carolina", "north dakota", "ohio",
      "oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
      "south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
      "washington", "west virginia", "wisconsin", "wyoming"
    ), class = "factor"
  )), .Names = "State", row.names = c(NA,-50L), class = "data.frame"
)

library(ggmap)

## to make sure we're using the correct geocode function I call it with 'ggmap::geocode'
ggmap::geocode(as.character(state_lat_long$State))
...
#           lon      lat
# 1   -74.00594 40.71278
# 2  -116.41939 38.80261
# 3   -99.90181 31.96860
# 4  -119.41793 36.77826
# 5   -94.68590 46.72955
# 6  -101.00201 47.55149

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM