简体   繁体   中英

extract (long, lat) from long address string in R

4517 bessie dr dallas, tx 75211 (32.728761, -96.895678)
3700 ross ave dallas, tx 75204 (32.797677, -96.786384)

I have a column in a dataframe that has values like listed above i want to create a 2 new fields long and lat that have the values between ,

this is what i have so far

data$longlat<-str_split(data$geocoded_column,sub("\\(.*", "", data$geocoded_column))
data$longlat<-str_sub(data$longlat,start=9)
which gives me 
32.728761, -96.895678)"
32.797677, -96.786384)"

You can extract the values using stringr and lookaround:

library(stringr)
str_extract_all(x, "(?<=\\()[^(]+(?=\\))")
[[1]]
[1] "32.728761, -96.895678" "32.797677, -96.786384"

To get the values into a dataframe:

df <- data.frame(
  long = unlist(str_extract_all(x, "(?<=\\()[^(,]+(?=,.*\\))")),
  lat = unlist(str_extract_all(x, "(?<=, )[^(]+(?=\\))"))
)
df
       long        lat
1 32.728761 -96.895678
2 32.797677 -96.786384

Data:

x <- "4517 bessie dr dallas, tx 75211 (32.728761, -96.895678) 3700 ross ave dallas, tx 75204 (32.797677, -96.786384)"

Does this work?

library(dplyr)
library(tidyr)
df <- data.frame(c1 = c('4517 bessie dr dallas, tx 75211 (32.728761, -96.895678)','3700 ross ave dallas, tx 75204 (32.797677, -96.786384)'))
df
                                                       c1
1 4517 bessie dr dallas, tx 75211 (32.728761, -96.895678)
2  3700 ross ave dallas, tx 75204 (32.797677, -96.786384)
df %>% extract(col = c1, into = c('lat','lon'), regex = '(-?\\d+\\.\\d+), (-?\\d+\\.\\d+)', remove = F)
                                                       c1       lat        lon
1 4517 bessie dr dallas, tx 75211 (32.728761, -96.895678) 32.728761 -96.895678
2  3700 ross ave dallas, tx 75204 (32.797677, -96.786384) 32.797677 -96.786384
 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM