简体   繁体   中英

handling '&' on-the-fly with xmllint

I have a large number of xml files to parse with xmllint. I just need to pull out the content of one or two nodes and dumb them in some new files.

I have no control over their format before they get to me.

I am trying to find a graceful way to handle characters like "&" (ampersand). They are not always escaped in the source xmls.

is there some way to handle this in a single xmllint command or do I need to prepare the xml files first?

I don't know about xmllint. But I do suggest to use other functions to do it. Or some script like html2text may work too.

In my case I solved it with:

echo -e $(echo "$responseXml" | xmllint --xpath '/xpath/to/extract/message/text()' - 2>/dev/null | sed 's/\&#\(x..\);/\\\1/g') | iconv --from=iso88591

The iconv may be unnecessary if your xml is not in ISO-8859-1 or if you don't want to convert it to UTF-8

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM