[英]Why is my regex backreference in R being reversed when I use one backslash with gsub?
I do not understand why I am required to use two backslashes to prevent a reversal of my backreference. 我不明白为什么需要使用两个反斜杠来防止反向引用反向。 Below, I detail how I discovered my problem: 下面,我详细介绍如何发现我的问题:
I wanted to transform a character that looks like this: 我想转换一个看起来像这样的字符:
x <- 53/100 000
And transform it to look like this: 并将其转换为如下形式:
53/100000
Here are a few ideas I had before I came to ask this question: 在提出这个问题之前,我有一些想法:
I thought that I could use the function gsub
to remove all spaces that occur after the /
character. 我以为可以使用函数gsub
删除/
字符后出现的所有空格。 However, I thought that a regex solution might be more elegant/efficient. 但是,我认为正则表达式解决方案可能更优雅/更有效。
At first, I didn't know how to backreference in regex, so I tried this: 最初,我不知道如何在正则表达式中反向引用,因此我尝试了以下操作:
> gsub("/.+\\\\s",".+",x) [1] "53.+000"
Then I read that you can backreference captured patterns using \\1
from this website . 然后我了解到您可以使用\\1
从本网站反向引用捕获的模式。 So I began to use this: 所以我开始用这个:
> gsub("/.+\\\\s","\\1",x) [1] "53\\001000"
Then I realized that the backreference only considers the wildcard match. 然后我意识到后向引用仅考虑通配符匹配。 But I wanted to keep the /
character. 但是我想保留/
字符。 So I added it back in: 因此,我将其添加回:
> gsub("/.+\\\\s","/\\1",x) [1] "53/\\001000"
I then tried a bunch of other things, but I fixed it by adding an extra backslash and enclosing my wildcard in parentheses: 然后,我尝试了一堆其他方法,但通过添加额外的反斜杠并将通配符括在括号中来解决此问题:
> gsub("/(.+)\\\\s","/\\\\1",x) [1] "53/100000"
Moreover, I was able to remove the /
character from my replacement by inserting the left parenthesis at the beginning of the pattern: 此外,通过在模式的开头插入左括号,我能够从替换中删除/
字符:
> gsub("(/.+)\\\\s","\\\\1",x) [1] "53/100000"
Hm, so it seemed two things were required: parentheses and an extra backslash. 嗯,所以似乎需要做两件事:括号和一个额外的反斜杠。 The parentheses I understand I think, because I believe the parentheses indicate what is the part of text that you are backreferencing. 我认为括号是我理解的,因为我相信括号表示您正在反向引用的文本部分。
What I do not understand is why two backslashes are required. 我不明白的是为什么需要两个反斜杠。 From the reference website it is said that only \\l
is required. 在参考网站上 ,据说只需要\\l
。 What's going on here? 这里发生了什么? Why is my backreference being reversed? 为什么我的反向引用被撤消了?
The extra backslash is required so that R doesn't parse the "\\1" as an escape character before passing it to gsub. 需要额外的反斜杠,以便R在将其传递给gsub之前不会将“ \\ 1”解析为转义字符。 "\\\\1" is read as the regex \\1 by gsub. gsub将“ \\\\ 1”读取为正则表达式\\ 1。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.