Extract a substring between two characters in R (REGEX)

Question

I am having trouble using regular expressions to extract a longitude and latitude from a string. The string is this:

[1] "\"42.352800\" data-longitude=\"-71.187500\" \"22\"></div>"

I want to be able to get both the first number "42.352800" and the second number "-71.187500" separately as two variables. Because I'll be doing this on a bunch of entries, I need to make sure that it can get these numbers whether they are positive or negative.

I figured I should be using a regular expression to say basically:

latitude <- from " to " (to get the first number)

and then something similar to get the longitude.

Any ideas here? I am relatively new to regex.

Answer 1

I agree with @r2evans that if you are scraping this information from a webpage it would be much simpler to get data using rvest for example.

To answer your question, you can use str_match to get first two numbers.

string <- "\"42.352800\" data-longitude=\"-71.187500\" \"22\"></div>"

stringr::str_match(string, '(\\d+\\.\\d+).*?(-?\\d+\\.\\d+)')[, -1]
#[1] "42.352800"  "-71.187500"

Extract a substring between two characters in R (REGEX)

Question

1 answers

solution1
0 ACCPTED 2020-06-08 01:17:49

Extract a substring between two characters in R (REGEX)

Question

1 answers

solution1 0 ACCPTED 2020-06-08 01:17:49

solution1
0 ACCPTED 2020-06-08 01:17:49