简体   繁体   English

Java正则表达式问题

[英]Java Regex Problem

I have as input text a big html file from where I have to extract some information using pattern matching. 我有一个很大的HTML文件作为输入文本,我必须在其中使用模式匹配从中提取一些信息。 The "region" is somehow as follows: “区域”如下所示:

 some html text
 <div debugState" style="display: none;">
            Model: ModelCode[BR324]
            Features: [S08TL, S0230, S0851, S0428, S01CD, S0879, S01CA, S08SP, S0698, S01CB, S0548, S08SC, S08TM, S01CC, S0801, S0258, P0668, S04AK]
            Packages: [S0801]
 </div>
        some html text

I wrote the following code. 我写了下面的代码。 (At debInfo ) is the html source to be scanned. debInfo )是要扫描的html源。 Due to 由于

Pattern model = Pattern.compile(".*(Model: ModelCode\\[\\w\\]).*, Pattern.DOTALL");
Pattern features = Pattern.compile(".*(Features: \\[\\w*\\]).*, Pattern.DOTALL");
Pattern packages = Pattern.compile(".*(Packages: \\[\\w*\\]).*, Pattern.DOTALL");


Matcher m1 = model.matcher(debInfo);
Matcher m2 = features.matcher(debInfo);
Matcher m3 = packages.matcher(debInfo);

boolean a = m1.matches();
boolean b = m2.matches();
boolean c = m3.matches();

System.out.println("matches(); " + a + " " + b + " " + c + " " + "\n" + debInfo);

and I am getting no match :-(. What am I doing wrong? Thanks in advance (a lot!) 而且我没有找到匹配项:-(。我做错了吗?预先谢谢(很多!)

You use \\\\w inside your (correctly escaped) square brackets. 您可以在\\\\w (正确转义的)方括号内使用\\\\w That matches only a single character. 仅匹配一个字符。 Try \\\\w+ or \\\\w* instead. 尝试改用\\\\w+\\\\w*

Also, you have included , Pattern.DOTALL in your String literal, which I think is a typo: 另外,您在String文字中包含了, Pattern.DOTALL ,我认为这是一个错字:

Pattern model = Pattern.compile(".*(Model: ModelCode\\[\\w+\\]).*", Pattern.DOTALL);

Also note that for the comma-and-space separated list of Features \\\\w* will not work, you'll need something like [\\\\w\\\\s,]* . 还要注意,对于Features \\\\w*的逗号和空格分隔列表将不起作用,您将需要类似[\\\\w\\\\s,]*

I think you need to use: 我认为您需要使用:

Pattern model = Pattern.compile(".*(Model: ModelCode\\[\\w*\\]).*", Pattern.DOTALL);
Pattern features = Pattern.compile(".*(Features: \\[\\w*\\]).*", Pattern.DOTALL);
Pattern packages = Pattern.compile(".*(Packages: \\[\\w*\\]).*", Pattern.DOTALL);

These are the correct patterns: 这些是正确的模式:

Pattern modelPattern = Pattern.compile(".*Model: ModelCode\\[(\\w*)\\].*",
        Pattern.DOTALL | Pattern.MULTILINE);
Pattern featuresPattern = Pattern.compile(".*Features: \\[([\\w\\s,]*)\\].*",
        Pattern.DOTALL | Pattern.MULTILINE);
Pattern packagesPattern = Pattern.compile(".*Packages: \\[([\\w\\s,]*)\\].*",
        Pattern.DOTALL | Pattern.MULTILINE);

它缺少MULTILINE开关。

 Pattern modelPattern = Pattern.compile(".*(Model: ModelCode\\[\\w*\\]).*", Pattern.DOTALL | Pattern.MULTILINE);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM