[英]Trouble with gsub and regex in R
我在R中使用gsub將文本添加到字符串的中間。 它工作得很好,但由於某種原因,當位置太長時,它會拋出錯誤。 代碼如下:
gsub(paste0('^(.{', as.integer(loc[1])-1, '})(.+)$'), new_cols, sql)
Error in gsub(paste0("^(.{273})(.+)$"), new_cols, sql) : invalid regular expression '^(.{273})(.+)$', reason 'Invalid contents of {}'
當括號中的數字(在這種情況下為273)較小時,此代碼可以正常工作,但當它很大時,則不行。
這會產生錯誤:
sql <- "The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats.The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats."
new_cols <- "happy"
gsub('^(.{125})(.+)$', new_cols, sql) #**Works
gsub('^(.{273})(.+)$', new_cols, sql)
Error in gsub("^(.{273})(.+)$", new_cols, sql) : invalid regular expression '^(.{273})(.+)$', reason 'Invalid contents of {}'
R gsub
默認使用TRE regex庫。 限制量詞中的邊界在從TRE代碼中定義的0到RE_DUP_MAX
之間有效。 看到這個TRE參考 :
綁定是以下之一,其中
n
和m
是0
和RE_DUP_MAX
之間的無符號十進制整數
似乎RE_DUP_MAX
設置為255(參見顯示#define RE_DUP_MAX 255
TRE源文件 ),因此,您不能在{n,m}
限制量詞中使用更多。
使用PCRE正則表達式風格,添加perl = TRUE
,它將起作用。
R演示 :
> sql <- "The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats.The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats."
> new_cols <- "happy"
> gsub('^(.{273})(.+)$', new_cols, sql, perl=TRUE)
[1] "happy"
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.