[英]R: How to use stringr to extract the substring as the output to mutate a column of strings that begins with a string pattern and end with a number?
I'm creating a small example to be put into mutate().我正在创建一个要放入 mutate() 的小示例。 Not sure why this doesn't work.不知道为什么这不起作用。
> str_extract("rs1234-<b>C</b>","^rs*\\d$")
[1] NA
I'd be great if you can point to my misunderstanding of the language instead of merely providing a solution.如果您能指出我对语言的误解而不是仅仅提供解决方案,我会很棒。 I expect to get "rs1234".我希望得到“rs1234”。
The ^rs*\\d$
regex matches ^rs*\\d$
正则表达式匹配
^
- start of string ^
- 字符串的开始rs*
- r
and zero or more occurrences of s
char rs*
- r
和零次或多次出现的s
字符\\d
- a digit \\d
- 一个数字$
- end of string. $
- 字符串的结尾。 So, your pattern matches strings like rsssss1
, r3
, etc.因此,您的模式匹配rsssss1
、 r3
等字符串。
You need你需要
str_extract("rs1234-<b>C</b>", "^rs\\d+")
where ^rs\\d+
matches rs
at the start of string and then one or more digits.其中^rs\\d+
匹配字符串开头的rs
,然后匹配一位或多位数字。 See this regex demo .请参阅此正则表达式演示。
But if I just want the substring in between "rs" and the last number.但是,如果我只想要“rs”和最后一个数字之间的子字符串。 What should I do?我应该怎么办?
You would use rs.*\\d
:你会使用rs.*\\d
:
str_extract("rs1234-<b>C</b>", "rs.*\\d")
where rs.*\\d
matches rs
, then any zero or more chars other than line break chars as many as possible and then a digit.其中rs.*\\d
匹配rs
,然后是尽可能多的除换行符以外的零个或多个字符,然后是一个数字。
NOTE: If you need to match line endings, too, you need to prepend the last pattern with (?s)
inline DOTALL modifier.注意:如果你也需要匹配行尾,你需要在最后一个模式前加上(?s)
内联 DOTALL 修饰符。
See this regex demo .请参阅此正则表达式演示。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.