简体   繁体   English

当我在gsub中使用一个反斜杠时,为什么我在R中的正则表达式反向引用被反向了?

[英]Why is my regex backreference in R being reversed when I use one backslash with gsub?

I do not understand why I am required to use two backslashes to prevent a reversal of my backreference. 我不明白为什么需要使用两个反斜杠来防止反向引用反向。 Below, I detail how I discovered my problem: 下面,我详细介绍如何发现我的问题:

I wanted to transform a character that looks like this: 我想转换一个看起来像这样的字符:

x <- 53/100 000

And transform it to look like this: 并将其转换为如下形式:

53/100000

Here are a few ideas I had before I came to ask this question: 在提出这个问题之前,我有一些想法:

I thought that I could use the function gsub to remove all spaces that occur after the / character. 我以为可以使用函数gsub删除/字符后出现的所有空格。 However, I thought that a regex solution might be more elegant/efficient. 但是,我认为正则表达式解决方案可能更优雅/更有效。

At first, I didn't know how to backreference in regex, so I tried this: 最初,我不知道如何在正则表达式中反向引用,因此我尝试了以下操作:

> gsub("/.+\\\\s",".+",x) [1] "53.+000"

Then I read that you can backreference captured patterns using \\1 from this website . 然后我了解到您可以使用\\1本网站反向引用捕获的模式。 So I began to use this: 所以我开始用这个:

> gsub("/.+\\\\s","\\1",x) [1] "53\\001000"

Then I realized that the backreference only considers the wildcard match. 然后我意识到后向引用仅考虑通配符匹配。 But I wanted to keep the / character. 但是我想保留/字符。 So I added it back in: 因此,我将其添加回:

> gsub("/.+\\\\s","/\\1",x) [1] "53/\\001000"

I then tried a bunch of other things, but I fixed it by adding an extra backslash and enclosing my wildcard in parentheses: 然后,我尝试了一堆其他方法,但通过添加额外的反斜杠并将通配符括在括号中来解决此问题:

> gsub("/(.+)\\\\s","/\\\\1",x) [1] "53/100000"

Moreover, I was able to remove the / character from my replacement by inserting the left parenthesis at the beginning of the pattern: 此外,通过在模式的开头插入左括号,我能够从替换中删除/字符:

> gsub("(/.+)\\\\s","\\\\1",x) [1] "53/100000"

Hm, so it seemed two things were required: parentheses and an extra backslash. 嗯,所以似乎需要做两件事:括号和一个额外的反斜杠。 The parentheses I understand I think, because I believe the parentheses indicate what is the part of text that you are backreferencing. 我认为括号是我理解的,因为我相信括号表示您正在反向引用的文本部分。

What I do not understand is why two backslashes are required. 我不明白的是为什么需要两个反斜杠。 From the reference website it is said that only \\l is required. 在参考网站上 ,据说只需要\\l What's going on here? 这里发生了什么? Why is my backreference being reversed? 为什么我的反向引用被撤消了?

The extra backslash is required so that R doesn't parse the "\\1" as an escape character before passing it to gsub. 需要额外的反斜杠,以便R在将其传递给gsub之前不会将“ \\ 1”解析为转义字符。 "\\\\1" is read as the regex \\1 by gsub. gsub将“ \\\\ 1”读取为正则表达式\\ 1。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM