简体   繁体   English

Python Web Scraping td类跨度

[英]Python Web Scraping td class span

New to Python and Web Scraping... I have been looking to scrape the highlighted section of code so I can retrieve the numbers 1.16, 7.50 and 14.67, but am having no joy in using a td, class, table-matches__odds pageSoup.find_all... anyone know what I'm missing here? Python和Web爬虫的新手...我一直在寻找刮擦代码中突出显示的部分,以便可以检索数字1.16、7.50和14.67,但是在使用td,class,table-matches__odds pageSoup.find_all时并不高兴...有人知道我在这里想念的吗?

I'm using beautifulsoup 4. 我正在使用beautifulsoup 4。

在此处输入图片说明

Awkward. 尴尬。

First I found the column of 'ratio' items (odds?), as reference points within the rows we want to plunder. 首先,我找到了“比率”项(奇数?)列,作为我们要掠夺的行中的参考点。 Put them in the list called ratio . 将它们放在称为ratio的列表中。

Then I had a look at the next siblings for a typical element of ratio , namely the first. 然后,我查看了ratio的典型元素的下一个同级项,即第一个。

You're interested only in the first row of the table, therefore I picked up ratio[0] and asked for its next siblings, which are all td elements. 您只对表的第一行感兴趣,因此我选择了ratio[0]并要求其下一个同级,它们都是td元素。

I then extracted what you want from each of these, depending on its internal stucture. 然后,根据它们的内部结构,从每种方法中提取所需的内容。 The only complicated one was the first. 唯一复杂的是第一个。 I used the descendants iterator to get its descendants, asked for the innermost one, and then got that one's attribute. 我使用descendants迭代器获取其后代,要求最内层的迭代器,然后获取该属性。

>>> import bs4
>>> import requests
>>> page = requests.get('http://www.betexplorer.com/soccer/scotland/premiership-2016-2017/results/').text
>>> soup = bs4.BeautifulSoup(page, 'lxml')
>>> ratio = soup.findAll('td', attrs={'class': 'h-text-center'})
>>> ratio[0].findNextSiblings()
[<td class="table-matches__odds colored"><span><span><span data-odd="1.16"></span></span></span></td>, <td class="table-matches__odds" data-odd="7.50"></td>, <td class="table-matches__odds" data-odd="14.67"></td>, <td class="h-text-right h-text-no-wrap">21.05.2017</td>]
>>> len(ratio)
15
>>> zeroth_ratio_sibs = ratio[0].findNextSiblings()
>>> first_item = list(zeroth_ratio_sibs[0].descendants)[2].attrs['data-odd']
>>> first_item
'1.16'
>>> second_item = zeroth_ratio_sibs[1].attrs['data-odd']
>>> second_item
'7.50'
>>> third_item = zeroth_ratio_sibs[2].attrs['data-odd']
>>> third_item 
'14.67'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM