简体   繁体   English

如何用正则表达式填补两个字符之间的差距

[英]How to fill gap between two characters with regex

I have a data set like below. 我有一个如下数据集。 I would like to replace all dots between two 1's with 1's, as shown in the desired.result. 我想用1来替换两个1之间的所有点,如期望结果所示。 Can I do this with regex in base R ? 我可以用基础R regex做到这一点吗?

I tried: 我试过了:

regexpr("^1\\.1$", my.data$my.string, perl = TRUE)

Here is a solution in c# 这是c#中的解决方案

Characters between two exact characters 两个确切字符之间的字符

Thank you for any suggestions. 谢谢你的任何建议。

my.data <- read.table(text='
     my.string                           state
     ................1...............1.    A
     ......1..........................1    A
     .............1.....2..............    B
     ......1.................1...2.....    B
     ....1....2........................    B
     1...2.............................    C
     ..........1....................1..    C
     .1............................1...    C
     .................1...........1....    C
     ........1....2....................    C
     ......1........................1..    C
     ....1....1...2....................    D
     ......1....................1......    D
     .................1...2............    D
', header = TRUE, na.strings = 'NA', stringsAsFactors = FALSE)

desired.result <- read.table(text='
     my.string                           state
     ................11111111111111111.    A
     ......1111111111111111111111111111    A
     .............1.....2..............    B
     ......1111111111111111111...2.....    B
     ....1....2........................    B
     1...2.............................    C
     ..........1111111111111111111111..    C
     .111111111111111111111111111111...    C
     .................1111111111111....    C
     ........1....2....................    C
     ......11111111111111111111111111..    C
     ....111111...2....................    D
     ......1111111111111111111111......    D
     .................1...2............    D
', header = TRUE, na.strings = 'NA', stringsAsFactors = FALSE)

Below is an option using gsub with the \\G feature and lookaround assertions. 下面是使用带有\\G功能和外观断言的gsub的选项。

> gsub('(?:1|\\G(?<!^))\\K\\.(?=\\.*1)', '1', my.data$my.string, perl = TRUE)
# [1] "................11111111111111111." "......1111111111111111111111111111"
# [3] ".............1.....2.............." "......1111111111111111111...2....."
# [5] "....1....2........................" "1...2............................."
# [7] "..........1111111111111111111111.." ".111111111111111111111111111111..."
# [9] ".................1111111111111...." "........1....2...................."
# [11] "......11111111111111111111111111.." "....111111...2...................."
# [13] "......1111111111111111111111......" ".................1...2............"

The \\G feature is an anchor that can match at one of two positions; \\G功能是一个可以在两个位置之一匹配的锚点; the start of the string position or the position at the end of the last match. 字符串位置的开头或最后一个匹配结束时的位置。 Since it seems you want to avoid the dots at the start of the string position we use a lookaround assertion \\G(?<!^) to exclude the start of the string. 因为看起来你想避免字符串位置开头的点,我们使用一个环绕断言\\G(?<!^)来排除字符串的开头。

The \\K escape sequence resets the starting point of the reported match and any previously consumed characters are no longer included. \\K转义序列重置报告的匹配的起始点,不再包括任何以前消耗的字符。

You can find an overall breakdown that explains the regular expression here . 您可以在此处找到解释正则表达式的整体细分。

Using gsubfn , the first argument is a regular expression which matches the 1's and the characters between the 1's and captures the latter. 使用gsubfn ,第一个参数是一个正则表达式,它匹配1和1之间的字符并捕获后者。 The second argument is a function, expressed in formula notation, which uses gsub to replace each character in the captured string with 1: 第二个参数是一个函数,用公式表示法表示,它使用gsub将捕获的字符串中的每个字符替换为1:

library(gsubfn)
transform(my.data, my.string = gsubfn("1(.*)1", ~ gsub(".", 1, x), my.string))

If there can be multiple pairs of 1's in a string then use "1(.*?)1" as the regular expression instead. 如果字符串中可以有多对1,则使用"1(.*?)1"作为正则表达式。

Visualization The regular expression here is simple enough that it can be directly understood but here is a debuggex visualization anwyays: 可视化这里的正则表达式很简单,可以直接理解,但这里是一个debuggex可视化anwyays:

1(.*)1

正则表达式可视化

Debuggex Demo Debuggex演示

Here is an option that uses a relatively simple regex and the standard combination of gregexpr() , regmatches() , and regmatches<-() to identify, extract, operate on, and then replace substrings matching that regex. 这是一个选项,它使用相对简单的正则表达式和gregexpr()regmatches()regmatches<-()的标准组合来识别,提取,操作,然后替换匹配该正则表达式的子字符串。

## Copy the character vector
x <- my.data$my.string
## Find sequences of "."s bracketed on either end by a "1"
m <- gregexpr("(?<=1)\\.+(?=1)", x, perl=TRUE)
## Standard template for operating on and replacing matched substrings
regmatches(x,m) <- sapply(regmatches(x,m), function(X) gsub(".", "1", X))

## Check that it worked
head(x)
# [1] "................11111111111111111." "......1111111111111111111111111111"
# [3] ".............1.....2.............." "......1111111111111111111...2....."
# [5] "....1....2........................" "1...2............................."

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM