简体   繁体   English

如何从字符串中删除字符并只保留 R 中的数字?

[英]How to remove the characters from a string and leave only the numbers in R?

a<- "\n\t\t\t\n\t\t\t\New\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t - \n\t\t\t\t\n\t\t\t\t95\n\t\t\t\tdays\n\t\t\t\n\t\t"

How to isolate only the number 95 from this string?如何从该字符串中仅隔离数字 95? I tried the gsub and str_replace but it removes the 95 too I removed this string from a site through the rvest package我尝试了gsubstr_replace但它也删除了 95 我通过rvest包从站点中删除了这个字符串

We can use gsub from base R to remove all characters that are not digits我们可以使用base R gsub删除所有不是数字的字符

gsub("\\D+", "", a)
#[1] "95"

Or as commented by @G Grothendieck或者正如@G Grothendieck 所评论的那样

gsub("\\D", "", a)

Or with str_remove_all或者使用str_remove_all

library(stringr)
str_remove_all(a, "\\D+")
#[1] "95"

The previous answers have approached the desired output negatively, by defining patterns for what is to be removed, namely anything that is not a number (hence \\\\D with uppercase D).通过定义要删除的内容的模式,即任何不是数字的内容(因此\\\\D带有大写 D),先前的答案已经否定了所需的输出。 Here's a positive solution defining what is to be kept, and extracting it via a self-defined function extract :这是一个定义要保留的内容并通过自定义函数extract的肯定解决方案:

Define function, including the pattern to be matched \\\\d{2} (ie, two contiguous numbers):定义函数,包括要匹配的模式\\\\d{2} (即两个连续的数字):

extract <- function(x) unlist(regmatches(x, gregexpr("\\d{2}", x, perl = T)))

Apply function to data a :将函数应用于数据a

extract(a)
[1] "95"

我打算建议使用readr::parse_number但后来我了解到它会在- readr::parse_number失败,然后需要额外的工作,如解释here

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM