简体   繁体   English

使用Python 3.x和精美汤解析网页

[英]Parsing Web Page Using Python 3.x and Beautiful Soup

I'm trying to parse this web-page by the categories of the headers (Searchable By, Special Summoned from Hand by, etc) I've looked around for a good parser for 3.3 but all I could find was BeautifulSoup (which I can't install because it's still coded in 2.x) and lxml which I can't understand. 我正在尝试按标题的类别(可搜索的,从手工召唤的特殊召唤等)来解析此网页 ,我一直在寻找一个适用于3.3的良好解析器,但我所能找到的只是BeautifulSoup(我可以无法安装,因为它仍然在2.x)和lxml中编码,我无法理解。 I try reading the HTML itself and searching the code for the headers but to no avail. 我尝试读取HTML本身,并在代码中搜索标头,但无济于事。 Can anyone help me? 谁能帮我?

Actually you could use Beautiful Soup for Python 3.x. 实际上,您可以为Python 3.x使用Beautiful Soup。 Beautiful Soup homepage says: 美丽汤首页说:

Beautiful Soup 4 works on both Python 2 (2.6+) and Python 3.

Beautiful Soup is licensed under the MIT license, so you can also download the 
tarball, drop the bs4/ directory into almost any Python application (or into 
your library path) and start using it immediately. (If you want to do this under 
Python 3, you will need to manually convert the code using 2to3.)

If you need help on how to manually convert Python 2 code to Python 3, refer to Converting BeautifulSoup 4 for Python 3 for instructions. 如果您需要有关如何将Python 2代码手动转换为Python 3的帮助,请参见将BeautifulSoup 4转换为Python 3以获得说明。 HTH. HTH。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM