简体   繁体   中英

Jsoup : SelectorParseException when colon in xml tag

Exception is thrown when xml tag has colon,

Exception:

org.jsoup.select.Selector$SelectorParseException: Could not parse query 'w:r': unexpected token at ':r'

XML:

<w:r>
 <w:rPr>
   <w:rStyle w:val="jid"/>
 </w:rPr>
 <w:t>AN</w:t>
</w:r>

Java code:

    org.jsoup.nodes.Document doc = Jsoup.parse(documentXmlString);

Here documentXmlString has the xml specified above

Just replace ":" with "|"

doc.select("w|r");

I'm using Jsoup 1.5.2.

Though your patchwork has worked for you.. I would like to give knowledge on namespace !

the w: in your XML is actually called namespace prefix. And to use neamespace prefix it has to be declared in the root node! 1+ Since the declaration part was missing in your source XML! parser was throwing an error! Below is the way to define namespace in XML! I have corrected your own XML, I bet it wouldn't error-out now!

<w:r xmlns:w="http://www.w3.org/SomeNamespace">
  <w:rPr>
    <w:rStyle w:val="jid"/>
  </w:rPr>
  <w:t>AN</w:t>
</w:r>

Additional information:

The namespace has its own scope! in the below example:

<root>
    <w:r xmlns:w="http://www.w3.org/SomeNamespace">
      <w:rPr>
        <w:rStyle w:val="jid"/>
      </w:rPr>
      <w:t>AN</w:t>
    </w:r>
    <someotherElement>
      <dummychild/>
    </someotherElement>

In the above example, you cannot use namespace prefix on <someotherElement> or <dummychild/> !! because the scope of namespace prefix w is upto element <r> and its child (grandchild) only!


1+:The Element under which Namespace is declared.. the namespace will be valid for itself and its child nodes.. Declaring namespace under root makes namespace valid/available for all the elements in XML Document.

我用了,

 documentXmlString = documentXmlString.replaceAll("w:","w");

JSoup is a html not an XML parser. For XML you can use JAXB or SAXON or Xstream.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM