简体   繁体   中英

Non-destructive parsing and modifying of HTML elements in C++

I have a need to do some simple modifications to HTML in C++, preferably without completely rewriting the HTML, such as what happens when I use libxml2 or MSHTML.

In particular I need to be able to read, and then (potentially) modify, the "src" attribute of all "img" elements. I need it to be robust enough to be able to do this with any valid HTML, but preferably without changing any of the other HTML in the process.

Are there any libraries out there that would be able to handle this? Or is this something I can do with regular expressions? I'm not too savvy with regular expressions, and I've read a lot of questions here that say you shouldn't use them to parse HTML, but I'm not clear if that applies to something like this or if that principle applies primarily to parsing in the context of building a tree from the HTML.

Regular expressions aren't recommended for HTML because they don't handle nested tags well. They should be fine for this purpose.

Try looking at HTMLTidy

I have used it for similar things in the past.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM