[英]Java Regex Problem
I have as input text a big html file from where I have to extract some information using pattern matching. 我有一个很大的HTML文件作为输入文本,我必须在其中使用模式匹配从中提取一些信息。 The "region" is somehow as follows: “区域”如下所示:
some html text
<div debugState" style="display: none;">
Model: ModelCode[BR324]
Features: [S08TL, S0230, S0851, S0428, S01CD, S0879, S01CA, S08SP, S0698, S01CB, S0548, S08SC, S08TM, S01CC, S0801, S0258, P0668, S04AK]
Packages: [S0801]
</div>
some html text
I wrote the following code. 我写了下面的代码。 (At debInfo
) is the html source to be scanned. ( debInfo
)是要扫描的html源。 Due to 由于
Pattern model = Pattern.compile(".*(Model: ModelCode\\[\\w\\]).*, Pattern.DOTALL");
Pattern features = Pattern.compile(".*(Features: \\[\\w*\\]).*, Pattern.DOTALL");
Pattern packages = Pattern.compile(".*(Packages: \\[\\w*\\]).*, Pattern.DOTALL");
Matcher m1 = model.matcher(debInfo);
Matcher m2 = features.matcher(debInfo);
Matcher m3 = packages.matcher(debInfo);
boolean a = m1.matches();
boolean b = m2.matches();
boolean c = m3.matches();
System.out.println("matches(); " + a + " " + b + " " + c + " " + "\n" + debInfo);
and I am getting no match :-(. What am I doing wrong? Thanks in advance (a lot!) 而且我没有找到匹配项:-(。我做错了吗?预先谢谢(很多!)
You use \\\\w
inside your (correctly escaped) square brackets. 您可以在\\\\w
(正确转义的)方括号内使用\\\\w
。 That matches only a single character. 仅匹配一个字符。 Try \\\\w+
or \\\\w*
instead. 尝试改用\\\\w+
或\\\\w*
。
Also, you have included , Pattern.DOTALL
in your String literal, which I think is a typo: 另外,您在String文字中包含了, Pattern.DOTALL
,我认为这是一个错字:
Pattern model = Pattern.compile(".*(Model: ModelCode\\[\\w+\\]).*", Pattern.DOTALL);
Also note that for the comma-and-space separated list of Features
\\\\w*
will not work, you'll need something like [\\\\w\\\\s,]*
. 还要注意,对于Features
\\\\w*
的逗号和空格分隔列表将不起作用,您将需要类似[\\\\w\\\\s,]*
。
I think you need to use: 我认为您需要使用:
Pattern model = Pattern.compile(".*(Model: ModelCode\\[\\w*\\]).*", Pattern.DOTALL);
Pattern features = Pattern.compile(".*(Features: \\[\\w*\\]).*", Pattern.DOTALL);
Pattern packages = Pattern.compile(".*(Packages: \\[\\w*\\]).*", Pattern.DOTALL);
These are the correct patterns: 这些是正确的模式:
Pattern modelPattern = Pattern.compile(".*Model: ModelCode\\[(\\w*)\\].*",
Pattern.DOTALL | Pattern.MULTILINE);
Pattern featuresPattern = Pattern.compile(".*Features: \\[([\\w\\s,]*)\\].*",
Pattern.DOTALL | Pattern.MULTILINE);
Pattern packagesPattern = Pattern.compile(".*Packages: \\[([\\w\\s,]*)\\].*",
Pattern.DOTALL | Pattern.MULTILINE);
它缺少MULTILINE
开关。
Pattern modelPattern = Pattern.compile(".*(Model: ModelCode\\[\\w*\\]).*", Pattern.DOTALL | Pattern.MULTILINE);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.