Exception is thrown when xml tag has colon,
Exception:
org.jsoup.select.Selector$SelectorParseException: Could not parse query 'w:r': unexpected token at ':r'
XML:
<w:r>
<w:rPr>
<w:rStyle w:val="jid"/>
</w:rPr>
<w:t>AN</w:t>
</w:r>
Java code:
org.jsoup.nodes.Document doc = Jsoup.parse(documentXmlString);
Here documentXmlString has the xml specified above
Just replace ":" with "|"
doc.select("w|r");
I'm using Jsoup 1.5.2.
Though your patchwork has worked for you.. I would like to give knowledge on namespace !
the w:
in your XML is actually called namespace prefix. And to use neamespace prefix it has to be declared in the root node! 1+ Since the declaration part was missing in your source XML! parser was throwing an error! Below is the way to define namespace in XML! I have corrected your own XML, I bet it wouldn't error-out now!
<w:r xmlns:w="http://www.w3.org/SomeNamespace">
<w:rPr>
<w:rStyle w:val="jid"/>
</w:rPr>
<w:t>AN</w:t>
</w:r>
Additional information:
The namespace has its own scope! in the below example:
<root>
<w:r xmlns:w="http://www.w3.org/SomeNamespace">
<w:rPr>
<w:rStyle w:val="jid"/>
</w:rPr>
<w:t>AN</w:t>
</w:r>
<someotherElement>
<dummychild/>
</someotherElement>
In the above example, you cannot use namespace prefix on <someotherElement>
or <dummychild/>
!! because the scope of namespace prefix w is upto element <r>
and its child (grandchild) only!
1+:The Element under which Namespace is declared.. the namespace will be valid for itself and its child nodes.. Declaring namespace under root makes namespace valid/available for all the elements in XML Document.
我用了,
documentXmlString = documentXmlString.replaceAll("w:","w");
JSoup is a html not an XML parser. For XML you can use JAXB or SAXON or Xstream.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.