简体   繁体   English

R:删除字符串中分隔符后的所有字母字符

[英]R: Removing all letter characters after a delimiter in a string

I would like to remove all the letters ([az]) that come after a delimiter (eg "-"), in a string, eg: 我想在字符串中删除分隔符后面的所有字母([az])(例如“ - ”),例如:

s <- "abc-10abc"

So to get: 所以得到:

> s2
[1] "abc-10"

How can I do this? 我怎样才能做到这一点? Thank you 谢谢

gsub("(.*\\d).*", "\\1", s)

The first pattern argument uses () to "capture" a group of characters. 第一个模式参数使用()来“捕获”一组字符。 Inside the capture we are looking for all wild card character until a digit \\\\d . 在捕获内部,我们正在寻找所有外卡字符,直到数字\\\\d This "captures" everything up until the last digit here. 这会“捕获”所有内容,直到最后一位数字。

Since the pattern argument also includes a multiple-wildcard after the capture group, the entire original string is being targeted for replacement. 由于pattern参数还包括捕获组之后的多通配符,因此整个原始字符串将作为替换目标。 The replace argument \\\\1 says to use the first (and in this case only) capture expression from the pattern argument. 替换参数\\\\1表示使用模式参数中的第一个(仅在此情况下)捕获表达式。

Let me know if that's not clear, this is my regex gospel for R regex help https://www.cheatography.com/davechild/cheat-sheets/regular-expressions/ 让我知道,如果不清楚,这是我的正则表达福音为R正则表达式的帮助https://www.cheatography.com/davechild/cheat-sheets/regular-expressions/

Like Rich Scriven pointed out you could substitute the .* with [az]* to target just letters a thru z after the last digit. 就像Rich Scriven指出的那样,你可以用.* [az]*代替.* [az]*来定位最后一个数字之后的字母。 You may want to add the argument ignore.case = TRUE to gsub() as well incase not everything is lower case: 您可能希望将参数ignore.case = TRUE添加到gsub()以及不是所有内容都是小写:

gsub("(.*\\-\\d*)[a-z]*", "\\1", s, ignore.case = TRUE)

I'm not a regex expert but I believe this follows your pattern. 我不是正则表达式专家,但我相信这符合你的模式。

gsub("(^.*-[^[:alpha:]]*)[[:alpha:]]*", "\\1", s)
#[1] "abc-10"

Explanation: 说明:

  1. ^ - beginning of string ^ - 字符串的开头
  2. ^.* any character at the beginning of string followed by zero or more repetitions of it. ^.*字符串开头的任何字符后跟零次或多次重复。
  3. - matches the delimiter in your question -匹配问题中的分隔符
  4. [^[:alpha:]]* the circunflex negates the class [:alpha:] , do not match alphabetic characters [^[:alpha:]]* circunflex否定了类[:alpha:] ,不匹配字母字符
  5. (all of above) form a pattern group, the first (and only) (all of above)形成一个模式组,第一个(也是唯一的)
  6. [[:alpha:]]* match an alphabetic character followed by zero or more repetitions of it [[:alpha:]]*匹配一个字母字符,然后重复零次或多次

Then in the replacement argument \\\\1 means to replace the pattern with the first group, so the [[:alpha:]]* part is left out. 然后在replacement参数中, \\\\1表示用第一组替换模式,因此省略[[:alpha:]]*部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM