简体   繁体   English

Java RegExp:捕获字符后的部分,但不要替换字符

[英]Java RegExp: Capture part after a character but don't replace the character

I am using Java to parse through a JavaScript file. 我正在使用Java来解析JavaScript文件。 Because the scope is different than expected in the environment in which I am using it, I am trying to replace every instance of ie 由于作用域与我在其中使用的环境不同,因此我尝试替换ie的每个实例

test = value

with

window.test = value

Previously, I had just been using 以前,我一直在使用

writer.append(js.getSource().replaceAll("test", "window.test"));

which obviously isn't generalizable, but for a fixed dataset it was working fine. 这显然不能推广,但对于固定数据集而言,它可以正常工作。

However, in the new files I'm supposed to work with, an updated version of the old ones, I now have to deal with 但是,在我应该使用的新文件中,旧文件的更新版本中,我现在必须处理

window['test'] = value

and

([[test]])

I don't want to match test in either of those cases, and it seems like those are the only two cases where there's a new format. 在这两种情况下,我都不想匹配test ,并且似乎只有这两种情况存在新格式。 So my plan was to now do a regex to match anything except ' and [ as the first character. 所以我的计划是现在做一个正则表达式以匹配'[作为第一个字符。 That would be ([^'\\[])test ; 那将是([^'\\[])test ; however, I don't actually want to replace the first character - just make sure it's not one of the two I don't want to match. 但是,我实际上并不想替换第一个字符-请确保它不是我不想匹配的两个字符之一。

This was a new situation for me because I haven't worked with replacement with RegExps that much, just pattern matching. 这对我来说是一个新情况,因为我没有花太多时间来用RegExps进行替换,而只是进行模式匹配。 So I looked around and found what I thought was the solution, something called "non-capturing groups". 因此,我环顾四周,发现我认为是解决方案的方法,称为“非捕获组”。 The explanation on the Oracle page sounded like what I was looking for, but when I re-wrote my Regular Expression to be (?:[^'\\\\[])test , it just behaved exactly the same as if I hadn't changed anything - replacing the character preceding test . Oracle页面上的解释听起来像我在寻找什么,但是当我将正则表达式重新编写为(?:[^'\\\\[])test ,它的行为与我没有做过的行为完全相同改变了一切-替换test前的字符。 I looked around StackOverflow, but what I discovered just made me more confident that what I was doing should work. 我环顾了StackOverflow,但发现的内容使我更加确信自己所做的应该可行。

What am I doing wrong that it's not working as expected? 我无法按预期工作,这是什么意思? Am I misusing the pattern? 我在滥用图案吗?

If you include an expression for the character in your regex, it will be part of what is matched. 如果您在正则表达式中包含该字符的表达式,则它将成为匹配项的一部分。

The trick is to use what you match in the replacement String, so you replace that bit by itself. 诀窍是使用您在替换字符串中匹配的内容,以便您自己替换该位。

try : 尝试:

replaceAll("([^'\[])test", "$1window.test"));

the $1 in the replacement String is a back reference to what capturing group 1 matched. 替换字符串中的$ 1是对匹配的捕获组1的反向引用。 In this case that is the character preceding test 在这种情况下,这是测试前的字符

Why not simply test on "(test)(\\s*)=(\\s*)([\\w\\d]+)" ? 为什么不简单地对"(test)(\\s*)=(\\s*)([\\w\\d]+)" That way you only match "test" , then whitespace, followed by an '=' sign followed by a value (in this case consisting of digits and alphabetical letters and the underscore character). 这样,您只匹配"test" ,然后是空格,后跟一个'='符号,后跟一个值(在这种情况下,由数字,字母和下划线组成)。 You can then use the groups (between parentheses) to copy the value -and even the whitespace if required - to your new text. 然后,您可以使用组(在括号之间)将值-甚至将空格(如果需要)复制到新文本中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM