简体繁体中英

Are there any benefits of using Beautiful Soup to parse XML over using lxml alone?

原文 2015-07-10 23:34:45 5 2 python/ xml/ beautifulsoup/ lxml

I use Beautiful Soup often to parse HTML files, so when I recently needed to parse an XML file, I chose to use it. However, because I'm parsing an extremely large file, it failed. When researching why it failed, I was led to this question: Loading huge XML files and dealing with MemoryError .

This leads me to my question: If lxml can handle large files and Beautiful Soup cannot, are there any benefits of using Beautiful Soup instead of simply using using lxml directly?

2 answers

If you look at this link about BeautifulSoup Parser :

"BeautifulSoup" is a Python package that parses broken HTML, while "lxml" does so faster but with high quality HTML/XML. So if you're dealing with the first one you're better off with BS... but the advantage of having "lxml" is that you're able to get the soupparser .

From that link I provided at the top it shows how you can use the capabilities of "BS" with "lxml"

So in the end ... you are better off with "lxml".

lxml is very fast, and is relatively memory efficient. BeautifulSoup by itself scores less well on the efficiency end, but is built to be compatible with non-standard / broken html and xml, meaning it is ultimately more versatile.

Which you choose to use is really just dependent on your use-case -- web scraping? probably BS. Parsing machine-written structured metadata? lxml is a great choice.

There is also the learning-curve to consider when making the switch - the two systems implement search and navigation strategies in slightly different ways; enough to make learning one system after starting with the other a non-trivial task.

How to parse and get the following values using lxml or beautiful soup

Parse XML using lxml

How to parse a htm file using Beautiful soup

Using Beautiful Soup to parse Edabit - Python

I am not able to parse using Beautiful Soup

Python Beautiful Soup parse a table using class

Find all attributes in an XML using Beautiful Soup

How to import an xml correctly using beautiful soup?

Using lxml to parse xml with Japanese

Using Python and Beautiful Soup

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to parse and get the following values using lxml or beautiful soup Parse XML using lxml How to parse a htm file using Beautiful soup Using Beautiful Soup to parse Edabit - Python I am not able to parse using Beautiful Soup Python Beautiful Soup parse a table using class Find all attributes in an XML using Beautiful Soup How to import an xml correctly using beautiful soup? Using lxml to parse xml with Japanese Using Python and Beautiful Soup

Related Tags

Are there any benefits of using Beautiful Soup to parse XML over using lxml alone?

Question

2 answers

solution1
1 ACCPTED 2015-07-10 23:45:12

solution2
1 2015-07-11 01:01:53

Are there any benefits of using Beautiful Soup to parse XML over using lxml alone?

Question

2 answers

solution1 1 ACCPTED 2015-07-10 23:45:12

solution2 1 2015-07-11 01:01:53

solution1
1 ACCPTED 2015-07-10 23:45:12

solution2
1 2015-07-11 01:01:53