简体   繁体   English

R中gsub和regex的问题

[英]Trouble with gsub and regex in R

I am using gsub in R to add text into the middle of a string. 我在R中使用gsub将文本添加到字符串的中间。 It works perfectly but for some reason, when the location gets too long it throws an error. 它工作得很好,但由于某种原因,当位置太长时,它会抛出错误。 The code is below: 代码如下:

gsub(paste0('^(.{', as.integer(loc[1])-1, '})(.+)$'), new_cols, sql)
 Error in gsub(paste0("^(.{273})(.+)$"), new_cols, sql) : invalid regular expression '^(.{273})(.+)$', reason 'Invalid contents of {}' 

This code works fine when the number in the brackets(273 in this case) is less but not when it is this large. 当括号中的数字(在这种情况下为273)较小时,此代码可以正常工作,但当它很大时,则不行。


This produces the error: 这会产生错误:

sql <- "The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats.The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats."  
new_cols <- "happy" 
gsub('^(.{125})(.+)$', new_cols, sql)  #**Works
gsub('^(.{273})(.+)$', new_cols, sql) 
 Error in gsub("^(.{273})(.+)$", new_cols, sql) : invalid regular expression '^(.{273})(.+)$', reason 'Invalid contents of {}' 

Background 背景

R gsub uses TRE regex library by default. R gsub默认使用TRE regex库。 The boundaries in the limiting quantifier are valid from 0 till RE_DUP_MAX that is defined in the TRE code. 限制量词中的边界在从TRE代码中定义的0到RE_DUP_MAX之间有效。 See this TRE reference : 看到这个TRE参考

A bound is one of the following, where n and m are unsigned decimal integers between 0 and RE_DUP_MAX 绑定是以下之一,其中nm0RE_DUP_MAX之间的无符号十进制整数

It seems that the RE_DUP_MAX is set to 255 (see this TRE source file showing #define RE_DUP_MAX 255 ), and thus, you cannot use more in {n,m} limiting quantifier. 似乎RE_DUP_MAX设置为255(参见显示#define RE_DUP_MAX 255 TRE源文件 ),因此,您不能在{n,m}限制量词中使用更多。

Solution

Use PCRE regex flavor, add perl = TRUE and it will work. 使用PCRE正则表达式风格,添加perl = TRUE ,它将起作用。

R demo : R演示

> sql <- "The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats.The cat with the bat went to town. He ate the fat mat and wouldn't stop til the sun came up. He was a fat cat that lived with a rat who owned many hats."
> new_cols <- "happy"
> gsub('^(.{273})(.+)$', new_cols, sql, perl=TRUE)
[1] "happy"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM