简体   繁体   English

R-根据模式处理字符串

[英]R - Manipulate string based on pattern

This is the name of a file that I have on R: 这是我在R上拥有的文件的名称:

> lst.files[1]
[1] "clt_Amon_CanESM2_rcp45_185001-230012.nc"

What I need to do is capture just the part until the 4th underscore (including), so it would be something like this: 我需要做的只是捕获到下划线(包括下划线)为止的部分,因此将是这样的:

clt_Amon_CanESM2_rcp45_

How can I get this in R? 我如何在R中得到这个?

Using the qdap package, you can do the following. 使用qdap软件包,您可以执行以下操作。

x <- "clt_Amon_CanESM2_rcp45_185001-230012.nc"

library(qdap)
beg2char(x, "_", 4, include = TRUE)
# [1] "clt_Amon_CanESM2_rcp45_"

If you know you always have (at least) four underscores, then you could do something like this: 如果您知道始终(至少)有四个下划线,则可以执行以下操作:

regmatches(lst, regexec(".*_.*_.*_.*_", lst.files[1]))[[1]]
# [1] "clt_Amon_CanESM2_rcp45_"

If potentially not always four, but no underscores in the second part, you could do something like this: 如果第二部分可能不总是四个,但没有下划线,则可以执行以下操作:

regmatches(lst, regexec(".*_", lst.files[1]))[[1]]
# [1] "clt_Amon_CanESM2_rcp45_"

This doesn't require any extra package, just base R. 这不需要任何额外的程序包,只需R。

We can also capture the repeating patterns as a group using sub . 我们还可以使用sub将重复模式捕获为一组。 We match one more more characters from the beginning ( ^ ) of the string that is not an underscore ( [^_]+ ) followed by an underscore ( \\\\_ ) which is repeated 4 times ( {4} ), capture that as a group by wrapping with parentheses followed by zero or more characters ( .* ). 我们从不是下划线( [^_]+ )的字符串的开头( ^ )再匹配一个字符,再跟下划线( \\\\_ )重复4次( {4} ),将其捕获为通过用括号后跟零个或多个字符( .* )进行包装来组成一个组。 We replace it with the capture group ( \\\\1 ) to get the expected output. 我们将其替换为捕获组( \\\\1 )以获取预期的输出。

sub('^(([^_]+\\_){4}).*', '\\1', str1)
#[1] "clt_Amon_CanESM2_rcp45_"

data 数据

str1 <-  "clt_Amon_CanESM2_rcp45_185001-230012.nc"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM