[英]remove alphanumeric with 2 alphabets followed by 2 digits
a <- c("it is ZZ10ASDJN123 and ZZ100DD22")
How can i remove the words starting with first 2 alphabets followed by starting 2 digit numbers and not remove any alphanumeric more than follows 2 + digit numbers.如何删除以前 2 个字母开头的单词,然后是 2 位数字,而不是删除超过 2 + 位数字的任何字母数字。
Expected output:预期输出:
"it is and ZZ100DD22"
This code removes the numbers alone.此代码仅删除数字。 Please help in geting me the expected output.请帮助我获得预期的输出。
gsub('[[:digit:]]+', '', a)
You may use您可以使用
gsub("\\s*\\b[A-Za-z]{2}\\d{2}(?!\\d)\\w*\\b", "", a, perl=TRUE)
See the regex demo .请参阅正则表达式演示。 An alternative:替代:
gsub("\\s*\\b[A-Za-z]{2}\\d{2}[A-Za-z_]\\w*\\b", "", a)
Details细节
\\s*
- 0 or more whitespace chars \\s*
- 0 个或多个空白字符\\b
- a word boundary \\b
- 单词边界[A-Za-z]{2}
- two ASCII letters (use \\p{L}
to match any Unicode letters) [A-Za-z]{2}
- 两个 ASCII 字母(使用\\p{L}
匹配任何 Unicode 字母)\\d{2}
- two digits \\d{2}
- 两位数(?!\\d)
- there can be no digit immediately to the right (?!\\d)
- 右边不能有数字\\w*
- 0 or more letters, digits or underscores \\w*
- 0 个或多个字母、数字或下划线\\b
- word boundary. \\b
- 字边界。 Add (*UCP)
at the start of the regex to make it fully Uniocde-aware.在正则表达式的开头添加(*UCP)
以使其完全识别 Uniocde。
a <- c("it is ZZ10ASDJN123 and ZZ100DD22")
gsub("\\s*\\b[A-Za-z]{2}\\d{2}(?!\\d)\\w*", "", a, perl=TRUE)
## => [1] "it is and ZZ100DD22"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.