简体   繁体   English

我的正则表达式模式有什么问题?

[英]What's wrong with my regex pattern?

I'm asked to catch any html tag using regular expression: 我被要求使用正则表达式捕获任何html标签:

A. <TAG ATTRIBUTE="VALUE"/> or
B. <TAG ATTRIBUTE="VALUE"> or
C. <TAG/> or
D. <TAG> or
E. </TAG>

Here is my pattern: 这是我的模式:

/** A pattern that matches a simple HTML markup. Group 1 matches
  *  the initial '/', if present.  Group 2 matches the tag.  Group
  *  3 matches the attribute name, if present.  Group 4 matches the
  *  attribute value (without quotes).  Group 5 matches the closing
  *  '/', if present. */
 public static final String HTML_P3 =
     "<(/)?\\s*([a-zA-Z]+)\\s*([a-zA-Z]+)?\\s*=?\\s*\\\"?([^\\\"]+)?\\\"?\\s*(/)?>";    

Here is a snippet of the test given: 这是给定测试的摘要:

public static void p3(String name, String markup) throws IOException {
    out.println("Problem #3.");
    Scanner inp = new Scanner(new FileReader(name));
    while (inp.findWithinHorizon(markup, 0) != null) {
        MatchResult mat = inp.match();
        if (mat.group(1) != null
            && (mat.group(5) != null || mat.group(3) != null)) {
            out.printf("Bad markup.%n");
            continue;
        }
        out.printf("Tag: %s", mat.group(2));
        if (mat.group(3) != null) {
            out.printf(", Attribute: %s, Value: \"%s\"",
                        mat.group(3), mat.group(4));
        }
        if (mat.group(5) != null || mat.group(1) != null) {
            out.print(" end");
        }
        out.println();
    }
    out.println();
}

Here is the input: 这是输入:

This is a simple <i>mark-up</i>.  Next comes
one <input value="3"/> that's closed, 
followed by a list of names:
<ol color="green">
<li> Tom </li>
<li  > Dick </li>
<li> Harry </li>
</ol>

The correct answer should be: 正确答案应为:

Problem #3.
Tag: i
Tag: i end
Tag: input, Attribute: value, Value: "3" end
Tag: ol, Attribute: color, Value: "green"
Tag: li
Tag: li end
Tag: li
Tag: li end
Tag: li
Tag: li end
Tag: ol end

However, I can never catch any ending tag, and here is my output: 但是,我永远无法捕获任何结束标记,这是我的输出:

Problem #3.
Tag: i
Tag: input, Attribute: value, Value: "3" end
Tag: ol, Attribute: color, Value: "green"
Tag: li

I've tried using regexpal.com and my pattern matches everything. 我尝试使用regexpal.com,并且我的模式匹配所有内容。 Can someone shed some lights please? 有人可以点灯吗?

First at all, since you are trying to write a regex pattern for java, use a java regex tester . 首先,由于您正在尝试为Java编写正则表达式模式,因此请使用Java regex tester

I'm not a java expert, but i'm not sure you need to triple escape the double quotes. 我不是Java专家,但是我不确定您是否需要三重转义双引号。

One of the problems in your pattern is that you use successive question marks: ([a-zA-Z]+)?\\\\s*=?\\\\s*\\"?([^\\"]+)?\\"? instead of grouping all in a non capturing group: 模式中的问题之一是您使用了连续的问号: ([a-zA-Z]+)?\\\\s*=?\\\\s*\\"?([^\\"]+)?\\"?而不是将全部分组到一个非捕获组中:

(?:([a-zA-Z]+)\\s*=\\s*\"([^\"]+)\")?

(if there is no attribute, then there is no equal, no quotes, no value too) (如果没有属性,那么就没有相等,没有引号,也没有值)

You can try this: (written as java string) 您可以尝试以下操作:( 写为java字符串)

"(?i)<(/)?([a-z1-6]+)(?:\\s+([a-z]+)\\s*=\\s*\"([^\"]*+)\"\\s*)?(/)?>"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM