Python 3 从页面中提取 html 信息

Question

I have been doing some googling but I can't really find a good python3 solution to my problem.我一直在做一些谷歌搜索，但我真的找不到一个好的 python3 解决我的问题。 Given the following HTML code, how do I extract 2019, 0.7 and 4.50% using python3?给定以下 HTML 代码，如何使用 python3 提取 2019、0.7 和 4.50%？

<td rowspan='2' style='vertical-align:middle'>2019</td><td rowspan='2' style='vertical-align:middle;font-weight:bold;'>4.50%</td><td rowspan='2' style='vertical-align:middle;font-weight:bold;'>SGD 0.7</td>   <td>SGD0.2      </td>

Answer 1

A solution using BeautifulSoup :使用BeautifulSoup的解决方案：

from bs4 import BeautifulSoup

txt = '''<td rowspan='2' style='vertical-align:middle'>2019</td><td rowspan='2' style='vertical-align:middle;font-weight:bold;'>4.50%</td><td rowspan='2' style='vertical-align:middle;font-weight:bold;'>SGD 0.7</td>   <td>SGD0.2      </td>'''

soup = BeautifulSoup(txt, 'html.parser')

info_1, info_2, info_3, *_ = soup.select('td')

info_1 = info_1.get_text(strip=True)
info_2 = info_2.get_text(strip=True)
info_3 = info_3.get_text(strip=True).split()[-1]

print(info_1, info_2, info_3)

Prints:印刷：

2019 4.50% 0.7

Answer 2

I think this might be helpful if does not exactly answer your question:如果不能完全回答您的问题，我认为这可能会有所帮助：

from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):
    def handle_data(self, data):
        print(data)

parser = MyHTMLParser()
parser.feed("<Your HTML here>")

For your particular case this will return: 2019 4.50% SGD 0.7 SGD0.2对于您的特定情况，这将返回：2019 4.50% SGD 0.7 SGD0.2

Python 3 从页面中提取 html 信息

问题描述

2 个解决方案

解决方案1
0 已采纳 2020-06-09 11:43:38

解决方案2
-1 2020-06-09 11:45:33

Python 3 从页面中提取 html 信息

问题描述

2 个解决方案

解决方案1 0 已采纳 2020-06-09 11:43:38

解决方案2 -1 2020-06-09 11:45:33

解决方案1
0 已采纳 2020-06-09 11:43:38

解决方案2
-1 2020-06-09 11:45:33