简体   繁体   中英

Trouble with gsub and regex in R

I am using gsub in R to add text into the middle of a string. It works perfectly but for some reason, when the location gets too long it throws an error. The code is below:

gsub(paste0('^(.{', as.integer(loc[1])-1, '})(.+)$'), new_cols, sql)
 Error in gsub(paste0("^(.{273})(.+)$"), new_cols, sql) : invalid regular expression '^(.{273})(.+)$', reason 'Invalid contents of {}' 

This code works fine when the number in the brackets(273 in this case) is less but not when it is this large.


This produces the error:

sql <- "The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats.The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats."  
new_cols <- "happy" 
gsub('^(.{125})(.+)$', new_cols, sql)  #**Works
gsub('^(.{273})(.+)$', new_cols, sql) 
 Error in gsub("^(.{273})(.+)$", new_cols, sql) : invalid regular expression '^(.{273})(.+)$', reason 'Invalid contents of {}' 

Background

R gsub uses TRE regex library by default. The boundaries in the limiting quantifier are valid from 0 till RE_DUP_MAX that is defined in the TRE code. See this TRE reference :

A bound is one of the following, where n and m are unsigned decimal integers between 0 and RE_DUP_MAX

It seems that the RE_DUP_MAX is set to 255 (see this TRE source file showing #define RE_DUP_MAX 255 ), and thus, you cannot use more in {n,m} limiting quantifier.

Solution

Use PCRE regex flavor, add perl = TRUE and it will work.

R demo :

> sql <- "The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats.The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats."
> new_cols <- "happy"
> gsub('^(.{273})(.+)$', new_cols, sql, perl=TRUE)
[1] "happy"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM