I'm trying to replace values of some attributes in an SVG file using StAX iterator API. I read an original file using XMLEventReader
, checking and modifying elements, and then writing into XMLEventWriter
.
My original file has the following structure:
<?xml version="1.0" encoding="UTF-8"?>
<!--
...
-->
<!DOCTYPE ...
...
]>
<svg ...
The output I get is not the same:
<?xml version="1.0"?><!--
...
--><!DOCTYPE ...
...
]><svg ...
As you can see, encoding
is gone, as well as newlines around a comment and doctype.
Also, order of all attributes on all tags in the resulting file seems to be random. I've read another question and I'm aware that attribute order is not guaranteed, but this doesn't help me.
These SVG files are under Git, so I'd like to preserve their plain-text layout as much as possible.
How do I fix those issues? With my current task, I could just replace attribute values as plain text, without any parsing, but I would like to have a solution which would allow me to take tag nesting and things like that into account.
If it can't be done with StAX, I'm totally open to different approaches. I've already tried DOM approach, and it's even worse. Maybe there are some 3d-party parsers...
VTD-XML(我是作者的开放源代码项目)是Java API,它在解析后保留底层字节,同时导出XML树的层次结构...这意味着您可以在以下位置替换字节的任何部分:原位,没有不必要的文件无关部分摆弄..甚至直接覆盖字节...开销为零
In cases involving updating attributes, the best option is not using XMLEventWriter
, but instead finding positions (character offsets) of tags in XML files and make substring replacements. You can do it like this:
XMLEventReader
, iterate through a file XMLEvent#getLocation()
, and then call getCharacterOffset()
on it, which will return the position in the original file, where this event was emitted. Downside: You have to parse attributes manually, but this is trivial in most cases.
Also, I found an issue with Characters
events: they are reported after subsequent <
or </
is already consumed. For example, in <foo>bar</foo>
the bar
characters will be reported like bar</
.
This may be different in other implementations of StAX, I'm using the default one from Java library. I assume this behavior can be explained by the fact that StAX parser never goes backwards, and when it has enough information to detect an end of characters event, it already consumes the beginning of a next element (opening or closing tag).
As for my original attempts to use XMLEventWriter
:
encoding
on XML header can be added by explicitly constructing a new StartDocument event.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.