简体   繁体   English

正则表达式用于仅提取R中字符串中的字母和数字

[英]regex for extracting only alphabets and numbers in a string in R

Hi I need a regex which extracts numbers and (numbers + alphabets) if present in a string. 嗨,我需要一个正则表达式来提取数字和(数字+字母)(如果存在于字符串中)。

Ex: "4596 2B FC JAIN BHAWAN" --> I want "4596 2B" as my output 例如:“ 4596 2B FC JAIN BHAWAN”->我希望将“ 4596 2B”作为我的输出

> gsub("\\S([a-zA-Z])+\\S", "", "4596 2B FC JAIN BHAWAN")
[1] "4596 2B FC  "

I do not understand why the above regex did not replace FC with "" 我不明白为什么上述正则表达式没有用“”代替FC

Any help is appreciated. 任何帮助表示赞赏。 Thanks 谢谢

You are using \\\\S (capital) which means "not a space", use the lower case, and only use it once (because the end of your string doesn't terminate with a space): 您正在使用\\\\S (大写字母),表示“不是空格”,使用小写字母,并且只能使用一次(因为字符串的结尾不以空格结尾):

gsub("\\s([a-zA-Z])+", "", "4596 2B FC JAIN BHAWAN")

Using Simon's suggestion allows us to see the woods for the trees: 使用西蒙的建议,我们可以看到树木的树林:

gsub("\\b[a-zA-Z]+\\b", "", "aa 4592 2B FC JAIN BHAWAN")
[1] " 4592 2B"

though I might need some help to get rid of the initial space. 尽管我可能需要一些帮助来摆脱最初的空间。 (I could just put nested gsub s but that seems cheating.) (我可以只放嵌套的gsub但这似乎很欺骗。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM