简体   繁体   English


[英]Extract and split lat long coordinates from WKT point data in R

I'm sure this will be a very straight forward answer. 我敢肯定,这将是一个非常简单的答案。 I am new to R and still finding my around it's data types. 我是R的新手,但仍然可以找到它的数据类型。 Currently importing data from MySQL however I can't quite figure out how to separate the columns bracketed inside a WKT point type. 当前从MySQL导入数据,但是我还不太清楚如何分隔WKT点类型内括号内的列。

I am running the following statement which involves a query to a shapefile contained within a database. 我正在运行以下语句,该语句涉及对数据库中包含的shapefile的查询。

mydb = dbConnect(MySQL(), user='root', password='mrwolf',dbname='jtw_schema', host='localhost') 
strSQL = "select sa2_main11, astext(shape) as geom from centroids 
    where (gcc_name11 = 'Greater Sydney') 
        and (sa4_name11 != 'Central Coast') 
            and (sa4_name11 not like '%Outer West%' ) 
                and (sa4_name11 not like '%Baulkham Hills%')
                    and (sa4_name11 not like '%Outer South West%')"

dfCord = dbGetQuery(mydb, strSQL)

Which results in: 结果是:

        sa2_main11                        geom
1    116011303 POINT(150.911550090995 -33.7568493603359)
2    116011304 POINT(150.889312296536 -33.7485997378428)
3    116011305 POINT(150.898781823296 -33.7817496751367)
4    116011306 POINT(150.872046414103 -33.7649465663774)

What I want to achieve is 我想要实现的是

    sa2_main11        Lat             Long                 
1    116011303 150.911550090995 -33.7568493603359
2    116011304 150.889312296536 -33.7485997378428
3    116011305 150.898781823296 -33.7817496751367
4    116011306 150.872046414103 -33.7649465663774

Apologies if this is very simple question, but have searched for separating WKT data and couldn't find any examples. 抱歉,这是一个非常简单的问题,但是已经搜索了分离WKT数据并且找不到任何示例。 Could try string search or similar but I imagine there is probably a "R-ish" way to do it. 可以尝试字符串搜索或类似的方法,但是我想可能有一种“ R-ish”的方法。

not a direct answer, but a workaround. 不是直接的答案,而是解决方法。 (assuming the geom column is a character vector? not sure if this is what you are looking for.) (假设geom列是一个字符向量?不确定这是否是您想要的。)

df <- data.frame(sa2_main11 = c("a","b","c", "d"),
                 geom = c("POINT(150.911550090995 -33.7568493603359)",
                          "POINT(150.889312296536 -33.7485997378428)",
                          "POINT(150.898781823296 -33.7817496751367)",
                          "POINT(150.872046414103 -33.7649465663774)"), stringsAsFactors = F)

df$longitude <- as.numeric(gsub(".*?([-]*[0-9]+[.][0-9]+).*", "\\1", df$geom))
df$latitude <- as.numeric(gsub(".* ([-]*[0-9]+[.][0-9]+).*", "\\1", df$geom))
df$geom <- NULL

This works for your data set if you get df as a data.frame from your data base. 如果您从数据库中将df作为data.frame获得,则此方法适用于您的数据集。

df <- data.frame(sa2_main11 = c(116011303, 116011304, 116011305, 116011306), 
           geom = c("POINT(150.911550090995 -33.7568493603359)", 
                    "POINT(150.889312296536 -33.7485997378428)",
                    "POINT(150.898781823296 -33.7817496751367)", 
                    "POINT(150.872046414103 -33.7649465663774)"))

geom <- sub(df$geom, pattern = "POINT", replacement = "")
geom <- sub(geom, pattern = "[(]", replacement = "")
geom <- sub(geom, pattern = "[)]", replacement = "")
lonlat <- unlist(strsplit(geom, split = " "))
df$lat <- lonlat[seq(1, length(lonlat), 2)]
df$long <- lonlat[seq(2, length(lonlat), 2)]

#   sa2_main11                                      geom             lat              long
# 1  116011303 POINT(150.911550090995 -33.7568493603359) 150.911550090995 -33.7568493603359
# 2  116011304 POINT(150.889312296536 -33.7485997378428) 150.889312296536 -33.7485997378428
# 3  116011305 POINT(150.898781823296 -33.7817496751367) 150.898781823296 -33.7817496751367
# 4  116011306 POINT(150.872046414103 -33.7649465663774) 150.872046414103 -33.7649465663774

In the end I managed to separate out the lat and long using a change to the SQL query as follows. 最后,我设法对SQL查询进行了如下更改,从而将经纬度和纬度分开。 In particular, the SUBSTR command. 特别是SUBSTR命令。 Seemed to make more sense than cleaning it up inside R. 似乎比在R里面清理起来更有意义。

select sa2_main11, substr(ASTEXT(shape), 7, 12) as lon, 
        when ltrim(substr(ASTEXT(shape), 23, 12)) > 0 
            then ltrim(substr(ASTEXT(shape), 23, 10)) * -1 
                else ltrim(substr(ASTEXT(shape), 23, 12))
                        as lat from centroids 

This produced the following output: 这产生了以下输出:

 sa2_main11, lon, lat
'116011303', '150.91155009', '-33.7568493'
'116011304', '150.88931229', '-33.7485997'
'116011305', '150.89878182', '-33.7817496'
'116011306', '150.87204641', '-33.7649465'
'116011307', '150.93909408', '-33.7617792'

Many thanks for your suggestions, was all helpful in understanding R 非常感谢您的建议,这些都有助于理解R

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM