简体   繁体   English

Python 3 从页面中提取 html 信息

[英]Python 3 extract html information from page

I have been doing some googling but I can't really find a good python3 solution to my problem.我一直在做一些谷歌搜索,但我真的找不到一个好的 python3 解决我的问题。 Given the following HTML code, how do I extract 2019, 0.7 and 4.50% using python3?给定以下 HTML 代码,如何使用 python3 提取 2019、0.7 和 4.50%?

<td rowspan='2' style='vertical-align:middle'>2019</td><td rowspan='2' style='vertical-align:middle;font-weight:bold;'>4.50%</td><td rowspan='2' style='vertical-align:middle;font-weight:bold;'>SGD 0.7</td>   <td>SGD0.2      </td>

A solution using BeautifulSoup :使用BeautifulSoup的解决方案:

from bs4 import BeautifulSoup

txt = '''<td rowspan='2' style='vertical-align:middle'>2019</td><td rowspan='2' style='vertical-align:middle;font-weight:bold;'>4.50%</td><td rowspan='2' style='vertical-align:middle;font-weight:bold;'>SGD 0.7</td>   <td>SGD0.2      </td>'''

soup = BeautifulSoup(txt, 'html.parser')

info_1, info_2, info_3, *_ = soup.select('td')

info_1 = info_1.get_text(strip=True)
info_2 = info_2.get_text(strip=True)
info_3 = info_3.get_text(strip=True).split()[-1]

print(info_1, info_2, info_3)

Prints:印刷:

2019 4.50% 0.7

I think this might be helpful if does not exactly answer your question:如果不能完全回答您的问题,我认为这可能会有所帮助:

from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):
    def handle_data(self, data):
        print(data)

parser = MyHTMLParser()
parser.feed("<Your HTML here>")

For your particular case this will return: 2019 4.50% SGD 0.7 SGD0.2对于您的特定情况,这将返回:2019 4.50% SGD 0.7 SGD0.2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM