简体   繁体   中英

Jsoup : SelectorParseException when colon in xml tag

Exception is thrown when xml tag has colon,


org.jsoup.select.Selector$SelectorParseException: Could not parse query 'w:r': unexpected token at ':r'


   <w:rStyle w:val="jid"/>

Java code:

    org.jsoup.nodes.Document doc = Jsoup.parse(documentXmlString);

Here documentXmlString has the xml specified above

Just replace ":" with "|"


I'm using Jsoup 1.5.2.

Though your patchwork has worked for you.. I would like to give knowledge on namespace !

the w: in your XML is actually called namespace prefix. And to use neamespace prefix it has to be declared in the root node! 1+ Since the declaration part was missing in your source XML! parser was throwing an error! Below is the way to define namespace in XML! I have corrected your own XML, I bet it wouldn't error-out now!

<w:r xmlns:w="http://www.w3.org/SomeNamespace">
    <w:rStyle w:val="jid"/>

Additional information:

The namespace has its own scope! in the below example:

    <w:r xmlns:w="http://www.w3.org/SomeNamespace">
        <w:rStyle w:val="jid"/>

In the above example, you cannot use namespace prefix on <someotherElement> or <dummychild/> !! because the scope of namespace prefix w is upto element <r> and its child (grandchild) only!

1+:The Element under which Namespace is declared.. the namespace will be valid for itself and its child nodes.. Declaring namespace under root makes namespace valid/available for all the elements in XML Document.


 documentXmlString = documentXmlString.replaceAll("w:","w");

JSoup is a html not an XML parser. For XML you can use JAXB or SAXON or Xstream.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM