[英]regex to exclude a sentence which contains a specific word in java
I am reading a file which contains lots of information like shown below: 我正在阅读一个包含大量信息的文件,如下所示:
type dw_3 from u_dw within w_pg6p0012_01
boolean visible = false
integer x = 1797
integer y = 388
integer width = 887
integer height = 112
integer taborder = 0
boolean bringtotop = true
string dataobject = "d_pg6p0012_14"
end type
type dw_3 from u_dw within w_pg6p0012_01
integer x = 1797
integer y = 388
integer width = 887
integer height = 112
integer taborder = 0
boolean bringtotop = true
string dataobject = "d_pg6p0012_14"
end type
I made regex : (?i)type dw_\\d\\s+(.*?)\\s+within(.*?)\\s+(?!boolean visible = false)(.*)
I want to extract all the strings which do not contain "boolean visible = false" but mine one is returning all. 我做了正则表达式:(
(?i)type dw_\\d\\s+(.*?)\\s+within(.*?)\\s+(?!boolean visible = false)(.*)
我想提取所有的字符串不要包含“boolean visible = false”但是我的一个返回全部。 I also tried many similar posts on stack but the result is similar to mine, please suggest a way. 我也在堆栈上尝试了很多类似的帖子,但结果与我的相似,请提示一下。
solution : (?i)type\\\\s+dw_(\\\\d+|\\\\w+)\\\\s+from\\\\s+.*?within\\\\s+.*?\\\\s+(string|integer)?\\\\s+.*\\\\s+.*\\\\s+.*\\\\s+.*?\\\\s+.*?\\\\s+.*?\\\\s*string\\\\s+dataobject\\\\s+=\\\\s+(.*?)\\\\s+end\\\\s+type")
解决方案:(
(?i)type\\\\s+dw_(\\\\d+|\\\\w+)\\\\s+from\\\\s+.*?within\\\\s+.*?\\\\s+(string|integer)?\\\\s+.*\\\\s+.*\\\\s+.*\\\\s+.*?\\\\s+.*?\\\\s+.*?\\\\s*string\\\\s+dataobject\\\\s+=\\\\s+(.*?)\\\\s+end\\\\s+type")
This is working well on regex checker but when i tried it on java it keep on running without giving any output 这在正则表达式检查器上运行良好,但是当我在java上尝试它时,它继续运行而不提供任何输出
It will be much easier (and more readable) if you make a regex to match "boolean visible = false"
and then exclude those lines that contain a match for it. 如果你使一个正则表达式匹配
"boolean visible = false"
然后排除那些包含匹配的行,那么它将更容易(也更具可读性)。
Pattern pattern = Pattern.compile("boolean visible = false");
Files.lines(filepath)
.filter(line -> !pattern.matcher(line).find()) // note the "!"
.forEach(/* do stuff */);
Notes: 笔记:
Files#lines(String)
, it is not necessary to break apart separate lines in the regex. Files#lines(String)
,所以没有必要拆分正则表达式中的单独行。 This is already done for us. Matcher#find()
method returns whether the given character sequence contains a match for the regex anywhere in it. Matcher#find()
方法返回给定的字符序列是否包含其中任何位置的正则表达式的匹配项。 I believe this is what you want. EDIT: 编辑:
Now, if you are just really intent on using a pure regex, then try this: 现在,如果您真的想要使用纯正则表达式,那么试试这个:
^((?!boolean visible = false).)+$
This will match an entire (non-empty) line if-and-only-if it does not contain "boolean visible = false"
anywhere within it. 这将匹配整个(非空)行if-and-only-如果它在其中的任何地方都不包含
"boolean visible = false"
。 No fancy backreferences / capture group semantics needed to extract the desired text. 没有花哨的反向引用/捕获组语义来提取所需的文本。
See proof by unit tests here: https://regex101.com/r/dbzdMB/1 请参阅此处的单元测试证明: https : //regex101.com/r/dbzdMB/1
EDIT #2: 编辑#2:
Alternatively, if all you are trying to do is to get the file text without any "boolean visible = false"
, then you could simply replace every instance of that target string with the empty string. 或者,如果您要做的只是获取没有任何
"boolean visible = false"
的文件文本,那么您可以简单地用空字符串替换该目标字符串的每个实例。
Pattern pattern = Pattern.compile("boolean visible = false");
Matcher matcher = pattern.matcher(fileAsCharSequence); // e.g. StringBuilder
String output = matcher.replaceAll("");
You can use this RegEx 您可以使用此RegEx
(\s*boolean visible = false)|(.*)
This basically defines 2 capture groups 这基本上定义了2个捕获组
First capture group (\\s*boolean visible = false)
will catch boolean visible = false
. 第一个捕获组
(\\s*boolean visible = false)
将捕获boolean visible = false
。
Second Capture group (.*)
will capture everything else except all that's capture by first capture group. 第二个捕获组
(.*)
将捕获除第一个捕获组捕获的所有内容之外的所有其他内容。
Now when you're extracting it, just capture second group and ignore first one. 现在当你提取它时,只需捕获第二组并忽略第一组。
Edit 编辑
Here's an example for clarification: 这是一个澄清的例子:
In this example, 在这个例子中,
See the output, which is without that line boolean visible = false
. 查看输出,该行没有该行
boolean visible = false
。
Output 产量
type dw_3 from u_dw within w_pg6p0012_01
integer x = 1797
integer y = 388
integer width = 887
integer height = 112
integer taborder = 0
boolean bringtotop = true
string dataobject = "d_pg6p0012_14"
end type
type dw_3 from u_dw within w_pg6p0012_01
integer x = 1797
integer y = 388
integer width = 887
integer height = 112
integer taborder = 0
boolean bringtotop = true
string dataobject = "d_pg6p0012_14"
end type
Java Implementation Java实现
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTut3 {
public static void main(String args[]) {
String file = getOriginalFileContents();
Pattern pattern = Pattern.compile("(\\s*boolean visible = false)|(.*)");
Matcher matcher = pattern.matcher(file);
while (matcher.find()) {
//System.out.print(matcher.group(1)); //ignore this group
if (matcher.group(2) != null) System.out.println(matcher.group(2));
}
}
//this method just get's the file contents as displayed in the
//question.
private static String getOriginalFileContents() {
String s = " type dw_3 from u_dw within w_pg6p0012_01\n" +
" boolean visible = false\n" +
" integer x = 1797\n" +
" integer y = 388\n" +
" integer width = 887\n" +
" integer height = 112\n" +
" integer taborder = 0\n" +
" boolean bringtotop = true\n" +
" string dataobject = \"d_pg6p0012_14\"\n" +
" end type\n" +
" \n" +
" type dw_3 from u_dw within w_pg6p0012_01\n" +
" integer x = 1797\n" +
" integer y = 388\n" +
" integer width = 887\n" +
" integer height = 112\n" +
" integer taborder = 0\n" +
" boolean bringtotop = true\n" +
" string dataobject = \"d_pg6p0012_14\"\n" +
" end type";
return s;
}
}
type dw_\d\s+(.*?)\s+within(.*)\n(?!\s*boolean visible = false\s*)[\s\S]*?\s+end type
Try this.See demo. 试试这个。看看演示。
https://regex101.com/r/Heex8W/1 https://regex101.com/r/Heex8W/1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.