[英]R - Manipulate string based on pattern
This is the name of a file that I have on R: 这是我在R上拥有的文件的名称:
> lst.files[1]
[1] "clt_Amon_CanESM2_rcp45_185001-230012.nc"
What I need to do is capture just the part until the 4th underscore (including), so it would be something like this: 我需要做的只是捕获到下划线(包括下划线)为止的部分,因此将是这样的:
clt_Amon_CanESM2_rcp45_
How can I get this in R? 我如何在R中得到这个?
Using the qdap
package, you can do the following. 使用
qdap
软件包,您可以执行以下操作。
x <- "clt_Amon_CanESM2_rcp45_185001-230012.nc"
library(qdap)
beg2char(x, "_", 4, include = TRUE)
# [1] "clt_Amon_CanESM2_rcp45_"
If you know you always have (at least) four underscores, then you could do something like this: 如果您知道始终(至少)有四个下划线,则可以执行以下操作:
regmatches(lst, regexec(".*_.*_.*_.*_", lst.files[1]))[[1]]
# [1] "clt_Amon_CanESM2_rcp45_"
If potentially not always four, but no underscores in the second part, you could do something like this: 如果第二部分可能不总是四个,但没有下划线,则可以执行以下操作:
regmatches(lst, regexec(".*_", lst.files[1]))[[1]]
# [1] "clt_Amon_CanESM2_rcp45_"
This doesn't require any extra package, just base R. 这不需要任何额外的程序包,只需R。
We can also capture the repeating patterns as a group using sub
. 我们还可以使用
sub
将重复模式捕获为一组。 We match one more more characters from the beginning ( ^
) of the string that is not an underscore ( [^_]+
) followed by an underscore ( \\\\_
) which is repeated 4 times ( {4}
), capture that as a group by wrapping with parentheses followed by zero or more characters ( .*
). 我们从不是下划线(
[^_]+
)的字符串的开头( ^
)再匹配一个字符,再跟下划线( \\\\_
)重复4次( {4}
),将其捕获为通过用括号后跟零个或多个字符( .*
)进行包装来组成一个组。 We replace it with the capture group ( \\\\1
) to get the expected output. 我们将其替换为捕获组(
\\\\1
)以获取预期的输出。
sub('^(([^_]+\\_){4}).*', '\\1', str1)
#[1] "clt_Amon_CanESM2_rcp45_"
str1 <- "clt_Amon_CanESM2_rcp45_185001-230012.nc"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.