正则表达式XML标签内部带有尖括号

Question

我需要一个正则表达式，它将给我一个XML标记，例如<ABC/>或<ABC></ABC>

因此，在这里，如果我使用<(.)+?> ，它将给我<ABC>或<ABC>或</ABC> 。 这可以。

现在，问题是：

我有一个XML作为

<VALUE ABC="10000" PQR="12422700" ADJ="" PROD_TYPE="COCOG EFI LWL P&amp;C >1Y-5Y" SRC="BASE" DATA="data" ACTION="INSERT" ID="100000" GRC_PROD=""/>

在这里，如果看到的话， PROD_TYPE="COCOG EFI LWL P&C >1Y-5Y"在属性值中具有大于符号。

所以，正则表达式还给我

<VALUE ABC="10000" PQR="12422700" ADJ="" PROD_TYPE="COCOG EFI LWL P&C >

而不是完整

<VALUE ABC="10000" PQR="12422700" ADJ="" PROD_TYPE="COCOG EFI LWL P&C >1Y-5Y" SRC="BASE" DATA="data" ACTION="INSERT" ID="100000" GRC_PROD=""/>

我需要一些正则表达式，它不会考虑小于和大于符号，它们是值的一部分，即用双引号引起来。

Answer 1

您可以尝试以下方法：

(?i)<[a-z][\w:-]+(?: [a-z][\w:-]+="[^"]*")*/?>

解释如下：

(?i)         # Match the remainder of the regex with the options: case insensitive (i)
<            # Match the character “<” literally
[a-z]        # Match a single character in the range between “a” and “z”
[\\w:-]       # Match a single character present in the list below
                # A word character (letters, digits, and underscores)
                # The character “:”
                # The character “-”
   +            # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?:          # Match the regular expression below
   \\            # Match the character “ ” literally
   [a-z]        # Match a single character in the range between “a” and “z”
   [\\w:-]       # Match a single character present in the list below
                   # A word character (letters, digits, and underscores)
                   # The character “:”
                   # The character “-”
      +            # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   =\"           # Match the characters “=\"” literally
   [^\"]         # Match any character that is NOT a “\"”
      *            # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   \"            # Match the character “\"” literally
)*           # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
/            # Match the character “/” literally
   ?            # Between zero and one times, as many times as possible, giving back as needed (greedy)
>            # Match the character “>” literally

而且，如果您想包含open ， close或self-closed标记，请尝试使用RegEx ：

(?i)(?:<([a-z][\w:-]+)(?: [a-z][\w:-]+="[^"]*")*>.+?</\1>|<([a-z][\w:-]+)(?: [a-z][\w:-]+="[^"]*")*/>)

实现相同的java代码片段：

try {
    boolean foundMatch = subjectString.matches("(?i)(?:<([a-z][\\w:-]+)(?: [a-z][\\w:-]+=\"[^\"]*\")*>.+?</\\1>|<([a-z][\\w:-]+)(?: [a-z][\\w:-]+=\"[^\"]*\")*/>)");
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

希望这可以帮助...

Answer 2

为了扩展G_H的链接，请执行以下操作：不要使用正则表达式来解析XML。 使用XPath返回一个Node，并将该Node传递给标识Transformer ：

Node valueElement = (Node)
    XPathFactory.newInstance().newXPath().evaluate("//VALUE",
        new InputSource(new StringReader(xmlDocument)),
        XPathConstants.NODE);

StringWriter result = new StringWriter();
TransformerFactory.newInstance().newTransformer().transform(
    new DOMSource(valueElement), new StreamResult(result));

String valueElementMarkup = result.toString();

Answer 3

也尝试一下：

<.*?(".*?".*?)*?>

它抓住之间的一切<和>仅在偶数"的意思是双引号对双引号是存在的。这些东西被封闭在，否则跳过>符号，并保持下一个进一步搜索> （这应该是收盘后发生"报价）

正则表达式XML标签内部带有尖括号

问题描述

3 个解决方案

解决方案1
1 已采纳 2016-04-07 12:35:55

解决方案2
1 2016-04-07 15:18:58

解决方案3
0 2016-04-08 17:27:42

正则表达式XML标签内部带有尖括号

问题描述

3 个解决方案

解决方案1 1 已采纳 2016-04-07 12:35:55

解决方案2 1 2016-04-07 15:18:58

解决方案3 0 2016-04-08 17:27:42

解决方案1
1 已采纳 2016-04-07 12:35:55

解决方案2
1 2016-04-07 15:18:58

解决方案3
0 2016-04-08 17:27:42