简体   繁体   English

使用正则表达式捕获错误

[英]Error trapping with regex

I have the following dataframe 我有以下数据框

ColumnA=c("Kuala Lumpur Sector 2 new","old Jakarta Sector31",    "Sector 9, 7 Hong Kong","Jakarta new Sector22")

and am extracting the Sector number to a separate column 并将部门编号提取到单独的列中

gsub(".*Sector ?([0-9]+).*","\\1",ColumnA)

Is there a more elegant way to capture errors if 'Sector' does not appear on one line than an if else statement? 如果'Sector'不在一行上,是否存在比if else语句更优雅的捕获错误的方法?

If the word 'Sector' does not appear on one line I simply want to set the value of that row to blank. 如果单词“ Sector”没有出现在一行上,我只想将该行的值设置为空白。

I thought of using str_detect first to see if 'Sector' was there TRUE/FALSE, but this is quite an ugly solution. 我考虑过先使用str_detect来查看“ Sector”是否为TRUE / FALSE,但这是一个非常丑陋的解决方案。

Thanks for any help. 谢谢你的帮助。

If the word 'Sector' does not appear on one line I simply want to set the value of that row to blank. 如果单词“ Sector”没有出现在一行上,我只想将该行的值设置为空白。

To achieve that, use alternation operator | 为此,请使用交替运算符| :

ColumnA=c("Kuala Lumpur 2 new","old Jakarta Sector31",    "Sector 9, 7 Hong Kong","Jakarta new Sector22")
gsub("^(?:.*Sector ?([0-9]+).*|.*)$","\\1",ColumnA)

Result: [1] "" "31" "9" "22" (as Kuala Lumpur 2 new has no Sector , the second part with no capturing group matched the whole string). 结果: [1] "" "31" "9" "22" (由于Kuala Lumpur 2 new没有Sector ,第二部分没有捕获组匹配整个字符串)。

See IDEONE demo IDEONE演示

library(stringr)
as.vector(sapply(str_extract(ColumnA, "(?<=Sector\\s{0,10})([0-9]+)"),function(x) replace(x,is.na(x),'')))

I think this is what you need. 我认为这就是您所需要的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM