![](/img/trans.png)
[英]Extracting data from an inconsistent HTML page using BeautifulSoup4 and Python
[英]Python - Extracting data from web page using Beautifulsoup
我正在尝试使用bs4
从网页中抓取一些数据下面是我到目前为止所做的,
import requests
from bs4 import BeautifulSoup
url = 'www.website.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
for article in soup.find_all('section'):
print(article)
上面的代码返回以下输出:
<section>
<ul class="row-full-width" style="margin:0; list-style: none; padding-left: 0; font-size: 120%">
<li class="four columns">
Comp A:
<i class="icon-rupee"></i>
<b>136.90</b>
Cr.
</li>
<li class="four columns">
Comp B:
<i class="icon-rupee"></i>
<b>10.95</b>
</li>
<li class="four columns">
Comp C:
<i class="icon-rupee"></i> <b>49.60</b> / <b>10.20</b>
</li>
<li class="four columns">
Comp D:
<i class="icon-rupee"></i>
<b>6.61</b>
</li>
<li class="four columns">
Comp E:
<b>25.78</b>
</li>
<li class="four columns">
Comp F:
<b>0.00</b>
%
</li>
<li class="four columns">
Comp G:
<b>9.39</b>
%
</li>
<li class="four columns">
Comp H:
<b>6.54</b>
%
</li>
<li class="four columns">
Comp I:
<b>19.39</b>
%
</li>
<li class="four columns">
我正在尝试提取每个 Comp 及其相应的值:
预期输出:
Comp A,136.90 Cr
Comp B, 10.95
Comp C, 49.60/10.20
Comp D, 6.61
Comp E, 25.78
Comp F, 0.0%
Comp G, 9.39%
Comp H, 6.54%
Comp I, 19.39%
您可以使用带有separator=
参数的get_text()
方法,然后拆分字符串。
例如( data
包含您的 HTML 字符串):
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html.parser')
print(soup.prettify())
for li in soup.select('li'):
row = li.get_text(strip=True, separator='|').split('|')
col1, col2 = row[0].replace(':', ''), ' '.join(row[1:])
print('{:<20}{:<20}'.format(col1, col2))
印刷:
Comp A 136.90 Cr.
Comp B 10.95
Comp C 49.60 / 10.20
Comp D 6.61
Comp E 25.78
Comp F 0.00 %
Comp G 9.39 %
Comp H 6.54 %
Comp I 19.39 %
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.