简体   繁体   English

Java中的多行RegEx

[英]Multiline RegEx in Java

(My programming question may seem somewhat devious, but I see no other solution.) (我的编程问题似乎有些曲解,但我看不到其他解决方案。)

A text is written in the editor of Eclipse. 在Eclipse的编辑器中编写了文本。 By activating a self-made Table view plugin for Eclipse, the text quality is checked automatically by an activated Python script (not editable by me) that receives the editor text. 通过为Eclipse激活一个自制的Table View插件,可以通过一个激活的Python脚本(我自己无法编辑)自动检查文本质量,该脚本接收编辑器文本。 The editor text is stripped from space characters (\\n, \\t) except the normal space (' '), because otherwise the sentences cannot be QA checked. 除了正常空格('')之外,编辑器文本还从空格字符(\\ n,\\ t)中删除,因为否则无法对句子进行质量检查。 When the script is done, it returns the incorrect sentences to the table. 脚本完成后,会将错误的句子返回到表中。

It is possible to click on the sentences in the table, and the plugin will search (row-per-row) in the active editor for the clicked sentence. 可以单击表中的句子,插件将在活动编辑器中搜索(逐行)所单击的句子。 This works for single-line sentences. 这适用于单行句子。 However, the multiline sentences cannot be found in the active editor, because all the \\n and \\t are missing in the compiled sentence. 但是,在活动编辑器中找不到多行语句,因为所有\\ n和\\ t在编译语句中都丢失了。

To overcome this problem, I changed the script so it takes the complete editor text as one string. 为了克服这个问题,我更改了脚本,因此它将完整的编辑器文本作为一个字符串。 I tried the following: 我尝试了以下方法:

String newSentence = tableSentence.replaceAll(" ", "\\s+")
Pattern p = Pattern.compile(newSentence)
Matcher contentMatcher = p.matcher(editorContent) // editorContent is a string
if (contentMatcher.find()) {
  // Get index offset of string and length of string
}

By changing all spaces into \\s+, I hoped to get the match. 通过将所有空格更改为\\ s +,我希望获得匹配。 However, this does not work because it will look like the following: 但是,这不起作用,因为它看起来将如下所示:

  • editorContent: The\\nright\\n\\ttasks. editorContent:\\ nright \\ n \\ t任务。
  • tableSentence: The right tasks. tableSentence:正确的任务。
  • NewSentence: Thes+rights+tasks. NewSentence:Thes +权利+任务。 // After the 'replaceAll' action //在“ replaceAll”操作之后
  • Should be: The\\s+right\\s+tasks. 应该是:\\ s + right \\ s +任务。

So, my question is: how can I adjust the input for the compiler? 所以,我的问题是:如何调整编译器的输入? I am inexperienced when it comes to Java, so I do not see how to change this.. And I unfortunately cannot change the Python script to also return the full sentences... 我对Java没有经验,所以我看不出如何更改它。.很不幸,我无法更改Python脚本也返回完整的句子...

Add a third and fourth backslash to your regex, so it looks like this: \\\\\\\\s+ . 在您的正则表达式中添加第三个和第四个反斜杠,因此它看起来像这样: \\\\\\\\s+

Java doesn't have raw (or verbatim) strings, so you have to escape a backslash, so in regex engine it will treat it as a double backslash. Java没有原始(或逐字)字符串,因此您必须转义反斜杠,因此在regex引擎中会将其视为双反斜杠。 This should solve the problem of adding a s+ instead of your spaces. 这应该可以解决添加s+而不是空格的问题。

When you type a regex in code it goes like this: 当您在代码中键入正则表达式时,它会像这样:

\\\\s+  
 |     # Compile time
 V  
\\s+  
 |     # regex parsing 
 V
 \s+   # actual regex used

Updated my answer according to @nhahtdh comment (fixed number of backslashes) 根据@nhahtdh注释更新了我的答案(固定的反斜杠数量)

You need to use "\\\\\\\\s+" instead of "\\\\s+" , since \\ is the escape character in the regex replacement string syntax . 您需要使用"\\\\\\\\s+"而不是"\\\\s+" ,因为\\正则表达式替换字符串语法中的转义字符。 To specify a literal \\ in the replacement text, you need to write \\\\ in the replacement string, and that doubles up to "\\\\\\\\" since \\ requires escaping in Java string literal. 要在替换文本中指定文字\\ ,您需要在替换字符串中输入\\\\ ,并且由于“ \\需要在Java字符串文字中进行转义,因此该数字加倍为"\\\\\\\\"

Note that \\ just happens to be used as escape character in regex replacement string syntax in Java. 注意\\恰好在Java中用作正则表达式替换字符串语法中的转义字符。 Other languages, such as JavaScript, uses $ to escape $ , so \\ doesn't need to be escape in JavaScript's regex replacement string. 其他语言(例如JavaScript)使用$来转义$ ,因此\\不需要在JavaScript的正则表达式替换字符串中进行转义。

If you are replacing a match with literal text, you can use Matcher.quoteReplacement to avoid dealing with the escaping in regex replacement string: 如果要用文字文本替换匹配项,则可以使用Matcher.quoteReplacement来避免处理正则表达式替换字符串中的转义:

String newSentence = tableSentence.replaceAll(" ", Matcher.quoteReplacement("\\s+"));

In this case, since you are searching for string and replace it with another string, you can use String.replace instead, which does normal string replacement: 在这种情况下,由于要搜索字符串并将其替换为另一个字符串,因此可以改用String.replace常规的字符串替换:

String newSentence = tableSentence.replace(" ", "\\s+");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM