简体   繁体   English

在 R (REGEX) 中的两个字符之间提取 substring

[英]Extract a substring between two characters in R (REGEX)

I am having trouble using regular expressions to extract a longitude and latitude from a string.我无法使用正则表达式从字符串中提取经度和纬度。 The string is this:字符串是这样的:

[1] "\"42.352800\" data-longitude=\"-71.187500\" \"22\"></div>"

I want to be able to get both the first number "42.352800" and the second number "-71.187500" separately as two variables.我希望能够分别获得第一个数字“42.352800”和第二个数字“-71.187500”作为两个变量。 Because I'll be doing this on a bunch of entries, I need to make sure that it can get these numbers whether they are positive or negative.因为我将在一堆条目上执行此操作,所以我需要确保它可以获取这些数字,无论它们是正数还是负数。

I figured I should be using a regular expression to say basically:我想我应该使用正则表达式基本上说:

latitude <- from " to " (to get the first number)纬度 <- 从“到”(获取第一个数字)

and then something similar to get the longitude.然后类似的东西得到经度。

Any ideas here?这里有什么想法吗? I am relatively new to regex.我对正则表达式比较陌生。

I agree with @r2evans that if you are scraping this information from a webpage it would be much simpler to get data using rvest for example.我同意@r2evans 的观点,如果您从网页上抓取这些信息,例如使用rvest获取数据会简单得多。

To answer your question, you can use str_match to get first two numbers.要回答您的问题,您可以使用str_match获取前两个数字。

string <- "\"42.352800\" data-longitude=\"-71.187500\" \"22\"></div>"

stringr::str_match(string, '(\\d+\\.\\d+).*?(-?\\d+\\.\\d+)')[, -1]
#[1] "42.352800"  "-71.187500"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM