简体   繁体   English

R:查找数字是否在字符串中的范围内

[英]R: find if number is within range in a character string

I have a string s where "substrings" are divided by a pipe. 我有一个字符串s ,其中“子串”由管道划分。 Substrings might or might not contain numbers. 子字符串可能包含也可能不包含数字。 And I have a test character string n that contains a number and might or might not contain letters. 我有一个包含数字的测试字符串n ,可能包含也可能不包含字母。 See example below. 见下面的例子。 Note that spacing can be any 请注意,间距可以是任何

I'm trying to drop all substrings where n is not in a range or is not an exact match. 我试图删除n不在范围内或不完全匹配的所有子串。 I understand that I need to split by - , convert to numbers, and compare low/high to n converted to numeric. 我知道我需要拆分- ,转换为数字,并将低/高转换为n转换为数字。 Here's my starting point, but then I got stuck with getting the final good string out of unl_new . 这是我的出发点,但后来因为unl_new得到了最终好的字符串而unl_new

s = "liquid & bar soap 1.0 - 2.0oz | bar 2- 5.0 oz | liquid soap 1-2oz | dish 1.5oz"
n = "1.5oz"

unl = unlist(strsplit(s,"\\|"))

unl_new = (strsplit(unl,"-"))
unl_new = unlist(gsub("[a-zA-Z]","",unl_new))

Desired output: 期望的输出:

"liquid & bar soap 1.0 - 2.0oz | liquid soap 1-2oz | dish 1.5oz"

Am I completely on the wrong path? 我完全走错了路吗? Thanks! 谢谢!

Don't know if it is general enough, but you might try: 不知道它是否足够通用,但您可以尝试:

require(stringr)
splitted<-strsplit(s,"\\|")[[1]]
ranges<-lapply(strsplit(
          str_extract(splitted,"[0-9\\.]+(\\s*-\\s*[0-9\\.]+|)"),"\\s*-\\s*"),
          as.numeric)
tomatch<-as.numeric(str_extract(n,"[0-9\\.]+"))
paste(splitted[
            vapply(ranges, function(x) (length(x)==1 && x==tomatch) || (length(x)==2 && findInterval(tomatch,x)==1),TRUE)],
             collapse="|")
#[1] "liquid & bar soap 1.0 - 2.0oz | liquid soap 1-2oz | dish 1.5oz"

Here an option using r-base ; 这里有一个使用r-base的选项;

## extract the n numeric
nn <- as.numeric(gsub("[^0-9|. ]", "", n))
## keep only numeric and -( for interval)
## and split by |
## for each interval test the condition to create a boolean vector
contains_n <- sapply(strsplit(gsub("[^0-9|. |-]", "", s),'[|]')[[1]],
       function(x){
         yy <- strsplit(x, "-")[[1]]
         yy <- as.numeric(yy[nzchar(yy)])
         ## the condition
         (length(yy)==1 && yy==nn) || length(yy)==2 && nn >= yy[1] && nn <= yy[2]
       })

## split again and use the boolean factor to remove the parts 
## that don't respect the condition
## paste the result using collapse to get a single character again
paste(strsplit(s,'[|]')[[1]][contains_n],collapse='')

## [1] "liquid & bar soap 1.0 - 2.0oz  liquid soap 1-2oz  dish 1.5oz"

Here's a method starting from your unl step using stringr : 这是从使用stringrunl步骤开始的方法:

unl = unlist(strsplit(s,"\\|"))
n2 <- as.numeric(gsub("[[:alpha:]]*", "", n))
num_lst <- str_extract_all(unl, "\\d\\.?\\d*")
indx <- lapply(num_lst, function(x) {
  if(length(x) == 1) {isTRUE(all.equal(n2, as.numeric(x))) 
  } else {n2 >= as.numeric(x[1]) & n2 <= as.numeric(x[2])}})

paste(unl[unlist(indx)], collapse=" | ")
[1] "liquid & bar soap 1.0 - 2.0oz  |  liquid soap 1-2oz  |  dish 1.5oz"

I also tested it with other amounts like "2.3oz" . 我还测试了其他数量,如"2.3oz" With n2 we coerce n to numeric for comparison. 对于n2我们将n强制转换为数字以进行比较。 The variable num_lst isolates the numbers from the character string. 变量num_lst将数字与字符串隔离开来。

With indx we apply our comparisions over the string numbers. 使用indx我们将对字符串数字进行比较。 if there is one number we check if it equals n2 . 如果有一个数字,我们检查它是否等于n2 I chose not to use the basic == operator to avoid any rounding issues. 我选择不使用基本==运算符来避免任何舍入问题。 Instead isTRUE(all.equal(x, y)) is used. 而是使用了isTRUE(all.equal(x, y))

Finally, the logical index variable indx is used to subset the character string to extract the matches and paste them together with a pipe "|" 最后,逻辑索引变量indx用于对字符串进行子集化以提取匹配项并将它们与管道"|"粘贴在一起 .

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM