R-根据模式处理字符串

Question

This is the name of a file that I have on R: 这是我在R上拥有的文件的名称：

> lst.files[1]
[1] "clt_Amon_CanESM2_rcp45_185001-230012.nc"

What I need to do is capture just the part until the 4th underscore (including), so it would be something like this: 我需要做的只是捕获到下划线（包括下划线）为止的部分，因此将是这样的：

clt_Amon_CanESM2_rcp45_

How can I get this in R? 我如何在R中得到这个？

Answer 1

Using the qdap package, you can do the following. 使用qdap软件包，您可以执行以下操作。

x <- "clt_Amon_CanESM2_rcp45_185001-230012.nc"

library(qdap)
beg2char(x, "_", 4, include = TRUE)
# [1] "clt_Amon_CanESM2_rcp45_"

Answer 2

If you know you always have (at least) four underscores, then you could do something like this: 如果您知道始终（至少）有四个下划线，则可以执行以下操作：

regmatches(lst, regexec(".*_.*_.*_.*_", lst.files[1]))[[1]]
# [1] "clt_Amon_CanESM2_rcp45_"

If potentially not always four, but no underscores in the second part, you could do something like this: 如果第二部分可能不总是四个，但没有下划线，则可以执行以下操作：

regmatches(lst, regexec(".*_", lst.files[1]))[[1]]
# [1] "clt_Amon_CanESM2_rcp45_"

This doesn't require any extra package, just base R. 这不需要任何额外的程序包，只需R。

Answer 3

We can also capture the repeating patterns as a group using sub . 我们还可以使用sub将重复模式捕获为一组。 We match one more more characters from the beginning ( ^ ) of the string that is not an underscore ( [^_]+ ) followed by an underscore ( \\\\_ ) which is repeated 4 times ( {4} ), capture that as a group by wrapping with parentheses followed by zero or more characters ( .* ). 我们从不是下划线（ [^_]+ ）的字符串的开头（ ^ ）再匹配一个字符，再跟下划线（ \\\\_ ）重复4次（ {4} ），将其捕获为通过用括号后跟零个或多个字符（ .* ）进行包装来组成一个组。 We replace it with the capture group ( \\\\1 ) to get the expected output. 我们将其替换为捕获组（ \\\\1 ）以获取预期的输出。

sub('^(([^_]+\\_){4}).*', '\\1', str1)
#[1] "clt_Amon_CanESM2_rcp45_"

data 数据

str1 <-  "clt_Amon_CanESM2_rcp45_185001-230012.nc"

R-根据模式处理字符串

问题描述

3 个解决方案

解决方案1
2 2015-10-24 03:35:49

解决方案2
2 已采纳 2015-10-24 03:36:42

解决方案3
2 2015-10-24 06:06:19

data 数据

R-根据模式处理字符串

问题描述

3 个解决方案

解决方案1 2 2015-10-24 03:35:49

解决方案2 2 已采纳 2015-10-24 03:36:42

解决方案3 2 2015-10-24 06:06:19

data 数据

解决方案1
2 2015-10-24 03:35:49

解决方案2
2 已采纳 2015-10-24 03:36:42

解决方案3
2 2015-10-24 06:06:19