简体   繁体   中英

Isolating specific numeric values in text

I have a significant amount of text to trawl through on a text csv file of various properties to find what the square metre numeric value of the properties actually is. For example:

string <- "This is a wonderful 120 sqm flat with a stunning view"

I know that I can use the following to extract the numeric value:

sqm <- as.numeric(gsub("\\D", "", string)) 

which returns a numeric vector of '120', as it should. However, I was wondering if there is a more sophisticated way to accomplish this, given that there could be other irrelevant numeric values in the text?

Is there some way to search for 'sqm' and return the numbers that precede it? Many thanks for any comments.

I believe this regex lookahead should work:

library(stringr)
##
string <- "This is a wonderful 120 sqm flat with a stunning view"
re <- "((\\d+)(?=\\s?sqm))"
##
R> str_extract(string, perl(re))
[1] "120"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM