简体   繁体   English

隔离文本中的特定数值

[英]Isolating specific numeric values in text

I have a significant amount of text to trawl through on a text csv file of various properties to find what the square metre numeric value of the properties actually is.我有大量文本要在各种属性的文本 csv 文件上进行搜索,以查找属性的平方米数值实际上是多少。 For example:例如:

string <- "This is a wonderful 120 sqm flat with a stunning view"

I know that I can use the following to extract the numeric value:我知道我可以使用以下内容来提取数值:

sqm <- as.numeric(gsub("\\D", "", string)) 

which returns a numeric vector of '120', as it should.它应该返回一个“120”的数字向量。 However, I was wondering if there is a more sophisticated way to accomplish this, given that there could be other irrelevant numeric values in the text?但是,我想知道是否有更复杂的方法来实现这一点,因为文本中可能存在其他不相关的数值?

Is there some way to search for 'sqm' and return the numbers that precede it?有什么方法可以搜索“平方米”并返回它前面的数字? Many thanks for any comments.非常感谢您的任何评论。

I believe this regex lookahead should work:我相信这个正则表达式前瞻应该有效:

library(stringr)
##
string <- "This is a wonderful 120 sqm flat with a stunning view"
re <- "((\\d+)(?=\\s?sqm))"
##
R> str_extract(string, perl(re))
[1] "120"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM