[英]apply a function to each cell in a column of a dataframe in R
EDIT Thanks to @user5249203 for pointing out that geocoding is best done with ggmaps' geocode call. 编辑感谢@ user5249203指出,最好通过ggmaps的地理编码调用来完成地理编码。 Watch out for NA's though.
提防NA。
I am struggling with the apply
family in R. 我在R的
apply
家庭中挣扎。
I am using a function which takes in a string and returns longitude and latitude 我正在使用一个接收字符串并返回经度和纬度的函数
> gGeoCode("Philadelphia, PA") [1] 39.95258 -75.16522
I have a simple dataframe which has the names of all 52 states: 我有一个简单的数据框,其中包含所有52个状态的名称:
dput(state_lat_long)
structure(
list(State = structure(
c(
32L, 28L, 43L, 5L, 23L, 34L,
30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
19L, 41L, 50L, 2L, 45L
), .Label = c(
"alabama", "alaska", "arizona",
"arkansas", "california", "colorado", "connecticut", "delaware",
"florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
"iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
"massachusetts", "michigan", "minnesota", "mississippi", "missouri",
"montana", "nebraska", "nevada", "new hampshire", "new jersey",
"new mexico", "new york", "north carolina", "north dakota", "ohio",
"oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
"south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
"washington", "west virginia", "wisconsin", "wyoming"
), class = "factor"
)), .Names = "State", row.names = c(NA,-50L), class = "data.frame"
)
To practice my apply
skills, I simply want to apply gGeoCode
to each cell in the only column of the state_lat_long
dataframe. 为了练习我的
apply
技巧,我只想将gGeoCode
应用于state_lat_long
数据帧的唯一列中的每个单元格。
Couldn't be much simpler. 再简单不过了。
Then what is the problem with this? 那这是什么问题呢?
> View(apply(state_lat_long, function(x) gGeoCode(x)))
When I run this, I get: 运行此命令时,我得到:
Error in View : argument "FUN" is missing, with no default
which I don't understand, because FUN
is not missing. 我不明白,因为
FUN
并不缺少。
So, let's try sapply
. 因此,让我们尝试
sapply
。 It's supposed to be simple, right? 它应该很简单,对吧?
But what is wrong with this? 但是,这有什么问题呢?
View(sapply(state_lat_long$State, function(x) gGeoCode(x)))
When I run this, I get 2 rows with 50 columns, packed with NA
s. 当我运行它时,我得到2行,每行50列,并挤满了
NA
。 I can't make sense of it. 我说不通
Next, I tried 接下来,我尝试了
View(apply(state_lat_long, 2, function(x) gGeoCode(x)))
and I got 我得到了
State
40.71278
-74.00594
Again, this makes no sense! 同样,这没有任何意义!
What am I doing wrong? 我究竟做错了什么? Thanks.
谢谢。
Is this how your data frame is ? 这是您的数据帧吗?
df = data.frame(State = c(
32L, 28L, 43L, 5L, 23L, 34L,
30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
19L, 41L, 50L, 2L, 45L
), Label = c(
"alabama", "alaska", "arizona",
"arkansas", "california", "colorado", "connecticut", "delaware",
"florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
"iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
"massachusetts", "michigan", "minnesota", "mississippi", "missouri",
"montana", "nebraska", "nevada", "new hampshire", "new jersey",
"new mexico", "new york", "north carolina", "north dakota", "ohio",
"oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
"south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
"washington", "west virginia", "wisconsin", "wyoming"
))
head(df)
State Label
1 32 alabama
2 28 alaska
3 43 arizona
4 5 arkansas
5 23 california
6 34 colorado
apply(df, 1, function(x) gGeoCode(x))
Alternatively, 或者,
mapply(FUN = gGeoCode, df$Label, SIMPLIFY = T)
Note: Some states still throws NA
. 注意:某些州仍会抛出
NA
。 Re-run of the code fetches the missing coordinates. 重新运行代码将获取缺少的坐标。 But, I expect it to work more efficiently if we know your input format/ dataframe construction.
但是,如果我们知道您的输入格式/数据框构造,我希望它可以更有效地工作。 Also, it is important to make sure the arguments you pass are what the
gGeoCode
expects. 同样,重要的是要确保传递的参数是
gGeoCode
期望的参数。
I realise this question was primarily about *apply
, but, if you were only after geocodes an easier option would be to use a vectorised function, such as ggmap::geocode
我意识到这个问题主要是关于
*apply
,但是,如果只是在进行ggmap::geocode
之后,一个简单的选择就是使用向量化函数,例如ggmap::geocode
state_lat_long <- structure(
list(State = structure(
c(
32L, 28L, 43L, 5L, 23L, 34L,
30L, 13L, 14L, 38L, 22L, 25L, 15L, 20L, 24L, 40L, 46L, 21L, 9L,
18L, 48L, 10L, 7L, 4L, 3L, 31L, 35L, 37L, 49L, 44L, 12L, 6L,
17L, 36L, 11L, 39L, 42L, 8L, 47L, 33L, 16L, 1L, 29L, 27L, 26L,
19L, 41L, 50L, 2L, 45L
), .Label = c(
"alabama", "alaska", "arizona",
"arkansas", "california", "colorado", "connecticut", "delaware",
"florida", "georgia", "hawaii", "idaho", "illinois", "indiana",
"iowa", "kansas", "kentucky", "louisiana", "maine", "maryland",
"massachusetts", "michigan", "minnesota", "mississippi", "missouri",
"montana", "nebraska", "nevada", "new hampshire", "new jersey",
"new mexico", "new york", "north carolina", "north dakota", "ohio",
"oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
"south dakota", "tennessee", "texas", "utah", "vermont", "virginia",
"washington", "west virginia", "wisconsin", "wyoming"
), class = "factor"
)), .Names = "State", row.names = c(NA,-50L), class = "data.frame"
)
library(ggmap)
## to make sure we're using the correct geocode function I call it with 'ggmap::geocode'
ggmap::geocode(as.character(state_lat_long$State))
...
# lon lat
# 1 -74.00594 40.71278
# 2 -116.41939 38.80261
# 3 -99.90181 31.96860
# 4 -119.41793 36.77826
# 5 -94.68590 46.72955
# 6 -101.00201 47.55149
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.