在R中提取混合数字和字符的字符串的数字部分

Question

我有很多字符串，每个字符串往往具有以下格式： Ab_Cd-001234.txt我想用001234替换它。 我怎样才能在R中实现它？

Answer 1

stringr包有很多方便的快捷方式用于这种工作：

# input data following @agstudy
data <-  c('Ab_Cd-001234.txt','Ab_Cd-001234.txt')

# load library
library(stringr)

# prepare regular expression
regexp <- "[[:digit:]]+"

# process string
str_extract(data, regexp)

Which gives the desired result:

  [1] "001234" "001234"

解释一下regexp：

[[:digit:]]是0到9之间的任意数字

+表示前一项（在本例中为数字）将匹配一次或多次

此页面对于此类字符串处理也非常有用： http ： //en.wikibooks.org/wiki/R_Programming/Text_Processing

Answer 2

使用gsub或sub你可以这样做：

 gsub('.*-([0-9]+).*','\\1','Ab_Cd-001234.txt')
"001234"

你可以使用regexpr和regmatches

m <- gregexpr('[0-9]+','Ab_Cd-001234.txt')
regmatches('Ab_Cd-001234.txt',m)
"001234"

编辑这两个方法是矢量化的，适用于字符串向量。

x <- c('Ab_Cd-001234.txt','Ab_Cd-001234.txt')
sub('.*-([0-9]+).*','\\1',x)
"001234" "001234"

 m <- gregexpr('[0-9]+',x)
> regmatches(x,m)
[[1]]
[1] "001234"

[[2]]
[1] "001234"

Answer 3

你可以使用genXtract从qdap包。 这将采用左字符串和右字符串，并提取其间的元素。

library(qdap)
genXtract("Ab_Cd-001234.txt", "-", ".txt")

虽然我更喜欢agstudy的答案。

编辑扩展答案以匹配agstudy：

x <- c('Ab_Cd-001234.txt','Ab_Cd-001234.txt')
genXtract(x, "-", ".txt")

# $`-  :  .txt1`
# [1] "001234"
# 
# $`-  :  .txt2`
# [1] "001234"

Answer 4

gsub删除前缀和后缀：

gsub(".*-|\\.txt$", "", x)

工具包使用工具中的 file_path_sans_ext删除扩展名，然后使用sub删除前缀：

library(tools)
sub(".*-", "", file_path_sans_ext(x))

strapplyc在点之前和之前提取数字。 有关更多信息，请参阅gsubfn主页：

library(gsubfn)
strapplyc(x, "-(\\d+)\\.", simplify = TRUE)

请注意，如果需要返回数字，我们可以使用strapply而不是strapplyc如下所示：

strapply(x, "-(\\d+)\\.", as.numeric, simplify = TRUE)

在R中提取混合数字和字符的字符串的数字部分

问题描述

4 个解决方案

解决方案1
24 2013-03-17 03:35:27

解决方案2
21 已采纳 2013-03-16 15:57:39

解决方案3
4 2013-03-16 16:05:20

解决方案4
2 2013-03-21 13:54:19

在R中提取混合数字和字符的字符串的数字部分

问题描述

4 个解决方案

解决方案1 24 2013-03-17 03:35:27

解决方案2 21 已采纳 2013-03-16 15:57:39

解决方案3 4 2013-03-16 16:05:20

解决方案4 2 2013-03-21 13:54:19

解决方案1
24 2013-03-17 03:35:27

解决方案2
21 已采纳 2013-03-16 15:57:39

解决方案3
4 2013-03-16 16:05:20

解决方案4
2 2013-03-21 13:54:19