简体   繁体   English

Python Beautiful Soup Web Scraping?

[英]Python Beautiful Soup Web Scraping?

I have utilised beautiful soup and the class_ to web scrape. 我利用了漂亮的汤和class_来刮网。 When I used find it was ok as I could use get.text() to find the text within the tags. 当我使用find时,可以使用get.text()在标记中查找文本。 However I want several of the values that are in the following below. 但是,我想要以下几个值。

boal_data = boal_soup(class_="investment-info__item grid__item lap--1-2 desk--1-2")
print (boal_data)

This then produces, when you print, the following. 然后在打印时会产生以下内容。

[<div class="investment-info__item grid__item lap--1-2 desk--1-2">
<h2 class="fontsize--p">Investment Date</h2>
<p class="fontsize--h3">Apr 2018</p>
</div>, <div class="investment-info__item grid__item lap--1-2 desk--1-2">
<h2 class="fontsize--p">Country</h2>
<p class="fontsize--h3">Netherlands</p>
</div>, <div class="investment-info__item grid__item lap--1-2 desk--1-2">
<h2 class="fontsize--p">Revenue at ACQ.</h2>
<p class="fontsize--h3">€156m</p>
</div>, <div class="investment-info__item grid__item lap--1-2 desk--1-2">
<h2 class="fontsize--p">Employees at ACQ.</h2>
<p class="fontsize--h3">370</p>
</div>]

I would like to be able to extra the following. 我希望能够补充以下内容。

<p class="fontsize--h3">[this text here] </p>

How am I able to do so? 我该怎么做?

use find or find_all to get text of paragraph tag. 使用findfind_all获取段落标签的文本。 You can try this 你可以试试这个

soup.find_all("p","fontsize--h3").getText()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM