[英]How to remove the characters from a string and leave only the numbers in R?
a<- "\n\t\t\t\n\t\t\t\New\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t - \n\t\t\t\t\n\t\t\t\t95\n\t\t\t\tdays\n\t\t\t\n\t\t"
How to isolate only the number 95 from this string?如何从该字符串中仅隔离数字 95? I tried the gsub and str_replace but it removes the 95 too I removed this string from a site through the rvest package我尝试了gsub和str_replace但它也删除了 95 我通过rvest包从站点中删除了这个字符串
We can use gsub
from base R
to remove all characters that are not digits我们可以使用base R
gsub
删除所有不是数字的字符
gsub("\\D+", "", a)
#[1] "95"
Or as commented by @G Grothendieck或者正如@G Grothendieck 所评论的那样
gsub("\\D", "", a)
Or with str_remove_all
或者使用str_remove_all
library(stringr)
str_remove_all(a, "\\D+")
#[1] "95"
The previous answers have approached the desired output negatively, by defining patterns for what is to be removed, namely anything that is not a number (hence \\\\D
with uppercase D).通过定义要删除的内容的模式,即任何不是数字的内容(因此\\\\D
带有大写 D),先前的答案已经否定了所需的输出。 Here's a positive solution defining what is to be kept, and extracting it via a self-defined function extract
:这是一个定义要保留的内容并通过自定义函数extract
的肯定解决方案:
Define function, including the pattern to be matched \\\\d{2}
(ie, two contiguous numbers):定义函数,包括要匹配的模式\\\\d{2}
(即两个连续的数字):
extract <- function(x) unlist(regmatches(x, gregexpr("\\d{2}", x, perl = T)))
Apply function to data a
:将函数应用于数据a
:
extract(a)
[1] "95"
我打算建议使用readr::parse_number
但后来我了解到它会在-
readr::parse_number
失败,然后需要额外的工作,如解释here 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.