正则表达式用于仅提取R中字符串中的字母和数字

Question

Hi I need a regex which extracts numbers and (numbers + alphabets) if present in a string. 嗨，我需要一个正则表达式来提取数字和（数字+字母）（如果存在于字符串中）。

Ex: "4596 2B FC JAIN BHAWAN" --> I want "4596 2B" as my output 例如：“ 4596 2B FC JAIN BHAWAN”->我希望将“ 4596 2B”作为我的输出

> gsub("\\S([a-zA-Z])+\\S", "", "4596 2B FC JAIN BHAWAN")
[1] "4596 2B FC  "

I do not understand why the above regex did not replace FC with "" 我不明白为什么上述正则表达式没有用“”代替FC

Any help is appreciated. 任何帮助表示赞赏。 Thanks 谢谢

Answer 1

You are using \\\\S (capital) which means "not a space", use the lower case, and only use it once (because the end of your string doesn't terminate with a space): 您正在使用\\\\S （大写字母），表示“不是空格”，使用小写字母，并且只能使用一次（因为字符串的结尾不以空格结尾）：

gsub("\\s([a-zA-Z])+", "", "4596 2B FC JAIN BHAWAN")

Using Simon's suggestion allows us to see the woods for the trees: 使用西蒙的建议，我们可以看到树木的树林：

gsub("\\b[a-zA-Z]+\\b", "", "aa 4592 2B FC JAIN BHAWAN")
[1] " 4592 2B"

though I might need some help to get rid of the initial space. 尽管我可能需要一些帮助来摆脱最初的空间。 (I could just put nested gsub s but that seems cheating.) （我可以只放嵌套的gsub但这似乎很欺骗。）

正则表达式用于仅提取R中字符串中的字母和数字

问题描述

1 个解决方案

解决方案1
5 已采纳 2014-05-19 07:49:36

正则表达式用于仅提取R中字符串中的字母和数字

问题描述

1 个解决方案

解决方案1 5 已采纳 2014-05-19 07:49:36

解决方案1
5 已采纳 2014-05-19 07:49:36