簡體   English   中英

正則表達式XML標簽內部帶有尖括號

[英]Regex XML tags having angle brackets inside

我需要一個正則表達式,它將給我一個XML標記,例如<ABC/><ABC></ABC>

因此,在這里,如果我使用<(.)+?> ,它將給我<ABC><ABC></ABC> 這可以。

現在,問題是:

我有一個XML作為

<VALUE ABC="10000" PQR="12422700" ADJ="" PROD_TYPE="COCOG EFI LWL P&amp;C >1Y-5Y" SRC="BASE" DATA="data" ACTION="INSERT" ID="100000" GRC_PROD=""/>

在這里,如果看到的話, PROD_TYPE="COCOG EFI LWL P&amp;C >1Y-5Y"在屬性值中具有大於符號。

所以,正則表達式還給我

<VALUE ABC="10000" PQR="12422700" ADJ="" PROD_TYPE="COCOG EFI LWL P&amp;C >

而不是完整

<VALUE ABC="10000" PQR="12422700" ADJ="" PROD_TYPE="COCOG EFI LWL P&amp;C >1Y-5Y" SRC="BASE" DATA="data" ACTION="INSERT" ID="100000" GRC_PROD=""/>

我需要一些正則表達式,它不會考慮小於和大於符號,它們是值的一部分,即用雙引號引起來。

您可以嘗試以下方法:

(?i)<[a-z][\w:-]+(?: [a-z][\w:-]+="[^"]*")*/?>

解釋如下:

(?i)         # Match the remainder of the regex with the options: case insensitive (i)
<            # Match the character “<” literally
[a-z]        # Match a single character in the range between “a” and “z”
[\\w:-]       # Match a single character present in the list below
                # A word character (letters, digits, and underscores)
                # The character “:”
                # The character “-”
   +            # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?:          # Match the regular expression below
   \\            # Match the character “ ” literally
   [a-z]        # Match a single character in the range between “a” and “z”
   [\\w:-]       # Match a single character present in the list below
                   # A word character (letters, digits, and underscores)
                   # The character “:”
                   # The character “-”
      +            # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
   =\"           # Match the characters “=\"” literally
   [^\"]         # Match any character that is NOT a “\"”
      *            # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
   \"            # Match the character “\"” literally
)*           # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
/            # Match the character “/” literally
   ?            # Between zero and one times, as many times as possible, giving back as needed (greedy)
>            # Match the character “>” literally

而且,如果您想包含opencloseself-closed標記,請嘗試使用RegEx

(?i)(?:<([a-z][\w:-]+)(?: [a-z][\w:-]+="[^"]*")*>.+?</\1>|<([a-z][\w:-]+)(?: [a-z][\w:-]+="[^"]*")*/>)

實現相同的java代碼片段:

try {
    boolean foundMatch = subjectString.matches("(?i)(?:<([a-z][\\w:-]+)(?: [a-z][\\w:-]+=\"[^\"]*\")*>.+?</\\1>|<([a-z][\\w:-]+)(?: [a-z][\\w:-]+=\"[^\"]*\")*/>)");
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

希望這可以幫助...

為了擴展G_H的鏈接,請執行以下操作: 不要使用正則表達式來解析XML。 使用XPath返回一個Node,並將該Node傳遞給標識Transformer

Node valueElement = (Node)
    XPathFactory.newInstance().newXPath().evaluate("//VALUE",
        new InputSource(new StringReader(xmlDocument)),
        XPathConstants.NODE);

StringWriter result = new StringWriter();
TransformerFactory.newInstance().newTransformer().transform(
    new DOMSource(valueElement), new StreamResult(result));

String valueElementMarkup = result.toString();

也嘗試一下:

<.*?(".*?".*?)*?>

它抓住之間的一切<>僅在偶數"的意思是雙引號對雙引號是存在的。這些東西被封閉在,否則跳過>符號,並保持下一個進一步搜索> (這應該是收盤后發生"報價)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM