简体   繁体   English

从R中的字符串正则表达式中删除除句点和数字之外的所有内容

[英]Remove everything except period and numbers from string regex in R

I know there are many questions on stack overflow regarding regex but I cannot accomplish this one easy task with the available help I've seen. 我知道关于正则表达式的堆栈溢出有很多问题但我无法通过我见过的可用帮助来完成这一个简单的任务。 Here's my data: 这是我的数据:

a<-c("Los Angeles, CA","New York, NY", "San Jose, CA")
b<-c("c(34.0522, 118.2437)","c(40.7128, 74.0059)","c(37.3382, 121.8863)")

df<-data.frame(a,b)
df
                a                    b
1 Los Angeles, CA c(34.0522, 118.2437)
2    New York, NY  c(40.7128, 74.0059)
3    San Jose, CA c(37.3382, 121.8863)

I would like to remove the everything but the numbers and the period (ie remove "c", ")" and "(". This is what I've tried thus far: 我想删除除数字和句点之外的所有内容(即删除“c”,“)”和“(”。这是我到目前为止所尝试的内容:

str_replace(df$b,"[^0-9.]","" )
[1] "(34.0522, 118.2437)" "(40.7128, 74.0059)"  "(37.3382, 121.8863)"

str_replace(df$b,"[^\\d\\)]+","" )
[1] "34.0522, 118.2437)" "40.7128, 74.0059)"  "37.3382, 121.8863)"

Not sure what's left to try. 不知道剩下要尝试什么。 I would like to end up with the following: 我想最终得到以下结论:

 [1] "34.0522, 118.2437" "40.7128, 74.0059"  "37.3382, 121.8863"

Thanks. 谢谢。

If I understand you correctly, this is what you want: 如果我理解正确,这就是你想要的:

df$b <- gsub("[^[:digit:]., ]", "", df$b)

or: 要么:

df$b <- strsplit(gsub("[^[:digit:]. ]", "", df$b), " +")
> df
                a                 b
1 Los Angeles, CA 34.0522, 118.2437
2    New York, NY  40.7128, 74.0059
3    San Jose, CA 37.3382, 121.8863

or if you want all the "numbers" as a numeric vector: 或者如果你想将所有“数字”作为数字向量:

as.numeric(unlist(strsplit(gsub("[^[:digit:]. ]", "", df$b), " +")))
[1]  34.0522 118.2437  40.7128  74.0059  37.3382 121.8863

Try this 尝试这个

gsub("[\\c|\\(|\\)]", "",df$b)
#[1] "34.0522, 118.2437" "40.7128, 74.0059"  "37.3382, 121.8863"

Not a regular expression solution, but a simple one. 不是正则表达式解决方案,而是简单的解决方案。

The elements of b are R expressions, so loop over each element, parsing it, then creating the string you want. b的元素是R表达式,因此遍历每个元素,解析它,然后创建所需的字符串。

vapply(
  b, 
  function(bi) 
  {
    toString(eval(parse(text = bi)))
  }, 
  character(1)
)

Here is another option with str_extract_all from stringr . 这里是另一种选择str_extract_allstringr Extract the numeric part using str_extract_all into a list , convert to numeric , rbind the list elements and cbind it with the first column of 'df' 使用str_extract_all将数字部分提取到list ,转换为numeric ,对list元素进行rbind并使用'df'的第一列对其进行cbind

library(stringr)
cbind(df[1], do.call(rbind, 
      lapply(str_extract_all(df$b, "[0-9.]+"), as.numeric)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM