简体   繁体   中英

Are there any benefits of using Beautiful Soup to parse XML over using lxml alone?

I use Beautiful Soup often to parse HTML files, so when I recently needed to parse an XML file, I chose to use it. However, because I'm parsing an extremely large file, it failed. When researching why it failed, I was led to this question: Loading huge XML files and dealing with MemoryError .

This leads me to my question: If lxml can handle large files and Beautiful Soup cannot, are there any benefits of using Beautiful Soup instead of simply using using lxml directly?

If you look at this link about BeautifulSoup Parser :

"BeautifulSoup" is a Python package that parses broken HTML, while "lxml" does so faster but with high quality HTML/XML. So if you're dealing with the first one you're better off with BS... but the advantage of having "lxml" is that you're able to get the soupparser .

From that link I provided at the top it shows how you can use the capabilities of "BS" with "lxml"

So in the end ... you are better off with "lxml".

lxml is very fast, and is relatively memory efficient. BeautifulSoup by itself scores less well on the efficiency end, but is built to be compatible with non-standard / broken html and xml, meaning it is ultimately more versatile.

Which you choose to use is really just dependent on your use-case -- web scraping? probably BS. Parsing machine-written structured metadata? lxml is a great choice.

There is also the learning-curve to consider when making the switch - the two systems implement search and navigation strategies in slightly different ways; enough to make learning one system after starting with the other a non-trivial task.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM