簡體   English   中英

R中gsub和regex的問題

[英]Trouble with gsub and regex in R

我在R中使用gsub將文本添加到字符串的中間。 它工作得很好,但由於某種原因,當位置太長時,它會拋出錯誤。 代碼如下:

gsub(paste0('^(.{', as.integer(loc[1])-1, '})(.+)$'), new_cols, sql)
 Error in gsub(paste0("^(.{273})(.+)$"), new_cols, sql) : invalid regular expression '^(.{273})(.+)$', reason 'Invalid contents of {}' 

當括號中的數字(在這種情況下為273)較小時,此代碼可以正常工作,但當它很大時,則不行。


這會產生錯誤:

sql <- "The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats.The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats."  
new_cols <- "happy" 
gsub('^(.{125})(.+)$', new_cols, sql)  #**Works
gsub('^(.{273})(.+)$', new_cols, sql) 
 Error in gsub("^(.{273})(.+)$", new_cols, sql) : invalid regular expression '^(.{273})(.+)$', reason 'Invalid contents of {}' 

背景

R gsub默認使用TRE regex庫。 限制量詞中的邊界在從TRE代碼中定義的0到RE_DUP_MAX之間有效。 看到這個TRE參考

綁定是以下之一,其中nm0RE_DUP_MAX之間的無符號十進制整數

似乎RE_DUP_MAX設置為255(參見顯示#define RE_DUP_MAX 255 TRE源文件 ),因此,您不能在{n,m}限制量詞中使用更多。

使用PCRE正則表達式風格,添加perl = TRUE ,它將起作用。

R演示

> sql <- "The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats.The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats."
> new_cols <- "happy"
> gsub('^(.{273})(.+)$', new_cols, sql, perl=TRUE)
[1] "happy"

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM