[英]Web Scraping, How to extract data from two same tags using bs4 in python
I am using bs4 with python and trying to fetch data from a web page.我在 python 中使用 bs4 并尝试从网页中获取数据。 Link I used inspect element over the info i want, but both have same tag,class.链接我在我想要的信息上使用了检查元素,但两者都具有相同的标签、类。
<a class="cell__value" data-tracker-action="click" data-tracker-label="information_technology.01" href="/markets/sectors/information-technology">
Information Technology
</a>
</div>
<div class="cell__return">
<div class="cell__label">
% Price Change
</div>
<div class="cell__value" data-type="better">
+0.05%
</div>
</div>
</div>
<div class="cell">
<div class="cell__name">
<div class="cell__label">
Industry
</div>
<a class="cell__value" data-tracker-action="click" data-tracker-label="information_technology.02" href="/markets/sectors/information-technology">
Software & Services
</a>
</div>
<div class="cell__return">
<div class="cell__label">
% Price Change
</div>
<div class="cell__value" data-type="worse">
-0.04%
</div>
</div>
</div>
</div>
I am doing it this way:我是这样做的:
sect= soup.find("a",{"data-tracker-label":"information_technology.01"})
print sect.text
sect_per= soup.find("div",{"data-type":"worse"or"better"})
print sect_per.text
ind=soup.find("a",{"data-tracker-label":"information_technology.02"})
print ind.text
ind_per=soup.find("div",{"div",{"data-type":"worse"or"better"})
print ind_per
both print ind_per and print ind_per are giving me same result because of same class and tag既打印ind_per和打印ind_per是给我,因为同一类标签相同的结果
i need to extract +0.05% and -0.04% respectively.我需要分别提取+0.05%和-0.04% 。
Please suggest me way to do it.请建议我的方法。
soup = BeautifulSoup(example, "html.parser")
for cell in soup.find_all("div", class_="cell"):
name = ""
namecell = cell.find("a", class_="cell__value", text=True)
if namecell is not None:
name = namecell.get_text(strip=True)
price_chage = cell.find("div", class_="cell__value").get_text(strip=True)
print ( "%s: Price Change: %s" % (name, price_chage))
Which outputs:哪些输出:
Information Technology: Price Change: +0.05%信息技术:价格变化:+0.05%
Software & Services: Price Change: -0.04%软件和服务:价格变化:-0.04%
You can save that values for further processing.您可以保存这些值以供进一步处理。
or
returns the left operand if the left operand is truth value (non-empty string for string): or
,如果左操作数是真值(字符串的非空字符串),则返回左操作数:
>>> "worse" or "better"
'worse'
So, the following line:因此,以下行:
ind_per = soup.find("div",{"div",{"data-type":"worse" or "better"})
is basically doing same with:基本上与以下内容相同:
ind_per = soup.find("div",{"div",{"data-type":"worse"})
You need to query them separately:您需要分别查询它们:
ind_per = soup.find("div",{"div",{"data-type": "worse"})
print ind_per
ind_per = soup.find("div",{"div",{"data-type": "better"})
print ind_per
or using for
loop:或使用for
循环:
for data_type in ('worse', 'better'):
ind_per = soup.find("div",{"div",{"data-type": data_type})
print ind_per
<p class="sort-num_votes-visible">
<span class="text-muted">Votes:</span>
**<span data-value="2333089" name="nv">2,333,089</span>**
<span class="ghost">|</span> <span class="text-muted">Gross:</span>
**<span data-value="28,341,469" name="nv">$28.34M</span>**
</p>
"I want to get a voting count and gross for movies but both have the same name is "nv" so we use indexing for these" “我想获得电影的投票计数和总票数,但两者的名称相同都是“nv”,因此我们对这些使用索引”
vote_mov=container.findAll("span",{"name":"nv"})
vote=vote_mov[0].text
gross_mov=container.findAll("span",{"name":"nv"})
gross=gross_mov[1].text
"Here it 1st get voting and then gross enter image description here “这里是第一次投票,然后在这里输入图像描述
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.