简体   繁体   English

正则表达式排除包含java中特定单词的句子

[英]regex to exclude a sentence which contains a specific word in java

I am reading a file which contains lots of information like shown below: 我正在阅读一个包含大量信息的文件,如下所示:

    type dw_3 from u_dw within w_pg6p0012_01
    boolean visible = false
    integer x = 1797
    integer y = 388
    integer width = 887
    integer height = 112
    integer taborder = 0
    boolean bringtotop = true
    string dataobject = "d_pg6p0012_14"
    end type

    type dw_3 from u_dw within w_pg6p0012_01
    integer x = 1797
    integer y = 388
    integer width = 887
    integer height = 112
    integer taborder = 0
    boolean bringtotop = true
    string dataobject = "d_pg6p0012_14"
    end type

I made regex : (?i)type dw_\\d\\s+(.*?)\\s+within(.*?)\\s+(?!boolean visible = false)(.*) I want to extract all the strings which do not contain "boolean visible = false" but mine one is returning all. 我做了正则表达式:( (?i)type dw_\\d\\s+(.*?)\\s+within(.*?)\\s+(?!boolean visible = false)(.*)我想提取所有的字符串不要包含“boolean visible = false”但是我的一个返回全部。 I also tried many similar posts on stack but the result is similar to mine, please suggest a way. 我也在堆栈上尝试了很多类似的帖子,但结果与我的相似,请提示一下。

solution : (?i)type\\\\s+dw_(\\\\d+|\\\\w+)\\\\s+from\\\\s+.*?within\\\\s+.*?\\\\s+(string|integer)?\\\\s+.*\\\\s+.*\\\\s+.*\\\\s+.*?\\\\s+.*?\\\\s+.*?\\\\s*string\\\\s+dataobject\\\\s+=\\\\s+(.*?)\\\\s+end\\\\s+type") 解决方案:( (?i)type\\\\s+dw_(\\\\d+|\\\\w+)\\\\s+from\\\\s+.*?within\\\\s+.*?\\\\s+(string|integer)?\\\\s+.*\\\\s+.*\\\\s+.*\\\\s+.*?\\\\s+.*?\\\\s+.*?\\\\s*string\\\\s+dataobject\\\\s+=\\\\s+(.*?)\\\\s+end\\\\s+type")

This is working well on regex checker but when i tried it on java it keep on running without giving any output 这在正则表达式检查器上运行良好,但是当我在java上尝试它时,它继续运行而不提供任何输出

It will be much easier (and more readable) if you make a regex to match "boolean visible = false" and then exclude those lines that contain a match for it. 如果你使一个正则表达式匹配"boolean visible = false"然后排除那些包含匹配的行,那么它将容易(也更具可读性)。

Pattern pattern = Pattern.compile("boolean visible = false");

Files.lines(filepath)
     .filter(line -> !pattern.matcher(line).find())  // note the "!"
     .forEach(/* do stuff */);

Notes: 笔记:

  • Because we are using Files#lines(String) , it is not necessary to break apart separate lines in the regex. 因为我们使用的是Files#lines(String) ,所以没有必要拆分正则表达式中的单独行。 This is already done for us. 这已经为我们完成了。
  • The Matcher#find() method returns whether the given character sequence contains a match for the regex anywhere in it. Matcher#find()方法返回给定的字符序列是否包含其中任何位置的正则表达式的匹配项。 I believe this is what you want. 我相信这就是你想要的。

EDIT: 编辑:

Now, if you are just really intent on using a pure regex, then try this: 现在,如果您真的想要使用纯正则表达式,那么试试这个:

^((?!boolean visible = false).)+$

This will match an entire (non-empty) line if-and-only-if it does not contain "boolean visible = false" anywhere within it. 这将匹配整个(非空)行if-and-only-如果它在其中的任何地方都不包含"boolean visible = false" No fancy backreferences / capture group semantics needed to extract the desired text. 没有花哨的反向引用/捕获组语义来提取所需的文本。

See proof by unit tests here: https://regex101.com/r/dbzdMB/1 请参阅此处的单元测试证明: https//regex101.com/r/dbzdMB/1


EDIT #2: 编辑#2:

Alternatively, if all you are trying to do is to get the file text without any "boolean visible = false" , then you could simply replace every instance of that target string with the empty string. 或者,如果您要做的只是获取没有任何"boolean visible = false"的文件文本,那么您可以简单地用空字符串替换该目标字符串的每个实例。

Pattern pattern = Pattern.compile("boolean visible = false");
Matcher matcher = pattern.matcher(fileAsCharSequence);  // e.g. StringBuilder
String output = matcher.replaceAll("");

You can use this RegEx 您可以使用此RegEx

(\s*boolean visible = false)|(.*)

DEMO DEMO

This basically defines 2 capture groups 这基本上定义了2个捕获组

  1. First capture group (\\s*boolean visible = false) will catch boolean visible = false . 第一个捕获组(\\s*boolean visible = false)将捕获boolean visible = false

  2. Second Capture group (.*) will capture everything else except all that's capture by first capture group. 第二个捕获组(.*)将捕获除第一个捕获组捕获的所有内容之外的所有其他内容。

Now when you're extracting it, just capture second group and ignore first one. 现在当你提取它时,只需捕获第二组并忽略第一组。


Edit 编辑

Here's an example for clarification: 这是一个澄清的例子:

In this example, 在这个例子中,

  • getOriginalFileContents() method gets the content of the file as shown in the program. getOriginalFileContents()方法获取程序中显示的文件内容。
  • Notice how we're getting both the groups, but ignoring the first group and printing only the second one. 注意我们如何获得两个组,但忽略第一组并仅打印第二组。

See the output, which is without that line boolean visible = false . 查看输出,该行没有该行boolean visible = false

Output 产量

 type dw_3 from u_dw within w_pg6p0012_01
 integer x = 1797
 integer y = 388
 integer width = 887
 integer height = 112
 integer taborder = 0
 boolean bringtotop = true
 string dataobject = "d_pg6p0012_14"
 end type


 type dw_3 from u_dw within w_pg6p0012_01
 integer x = 1797
 integer y = 388
 integer width = 887
 integer height = 112
 integer taborder = 0
 boolean bringtotop = true
 string dataobject = "d_pg6p0012_14"
 end type

Java Implementation Java实现

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexTut3 {

    public static void main(String args[]) {
        String file = getOriginalFileContents();
        Pattern pattern = Pattern.compile("(\\s*boolean visible = false)|(.*)");
        Matcher matcher = pattern.matcher(file);
        while (matcher.find()) {
            //System.out.print(matcher.group(1)); //ignore this group
            if (matcher.group(2) != null) System.out.println(matcher.group(2));
        }
    }

    //this method just get's the file contents as displayed in the
    //question. 
    private static String getOriginalFileContents() {
        String s = "     type dw_3 from u_dw within w_pg6p0012_01\n" +
            "     boolean visible = false\n" +
            "     integer x = 1797\n" +
            "     integer y = 388\n" +
            "     integer width = 887\n" +
            "     integer height = 112\n" +
            "     integer taborder = 0\n" +
            "     boolean bringtotop = true\n" +
            "     string dataobject = \"d_pg6p0012_14\"\n" +
            "     end type\n" +
            "     \n" +
            "     type dw_3 from u_dw within w_pg6p0012_01\n" +
            "     integer x = 1797\n" +
            "     integer y = 388\n" +
            "     integer width = 887\n" +
            "     integer height = 112\n" +
            "     integer taborder = 0\n" +
            "     boolean bringtotop = true\n" +
            "     string dataobject = \"d_pg6p0012_14\"\n" +
            "     end type";

        return s;
    }
}
type dw_\d\s+(.*?)\s+within(.*)\n(?!\s*boolean visible = false\s*)[\s\S]*?\s+end type

Try this.See demo. 试试这个。看看演示。

https://regex101.com/r/Heex8W/1 https://regex101.com/r/Heex8W/1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM