Isolating specific numeric values in text

Question

I have a significant amount of text to trawl through on a text csv file of various properties to find what the square metre numeric value of the properties actually is. For example:

string <- "This is a wonderful 120 sqm flat with a stunning view"

I know that I can use the following to extract the numeric value:

sqm <- as.numeric(gsub("\\D", "", string))

which returns a numeric vector of '120', as it should. However, I was wondering if there is a more sophisticated way to accomplish this, given that there could be other irrelevant numeric values in the text?

Is there some way to search for 'sqm' and return the numbers that precede it? Many thanks for any comments.

Answer 1

I believe this regex lookahead should work:

library(stringr)
##
string <- "This is a wonderful 120 sqm flat with a stunning view"
re <- "((\\d+)(?=\\s?sqm))"
##
R> str_extract(string, perl(re))
[1] "120"

Isolating specific numeric values in text

Question

1 answers

solution1
2 ACCPTED 2015-04-29 14:10:43

Isolating specific numeric values in text

Question

1 answers

solution1 2 ACCPTED 2015-04-29 14:10:43

solution1
2 ACCPTED 2015-04-29 14:10:43