简体   繁体   English

如何使用beatifulsoup python提取表的列和行

[英]How to extract table column and rows using beatifulsoup python

Beautifulsoup noob here. Beautifulsoup菜鸟在这里。 Just for practice, I'm trying to extract the package and version column of this page here. 仅出于练习目的,我尝试在此处提取此页面的软件包和版本列。 I tried to get the table content using: table = soup.find('table', attrs={'class': 'listing sortable'}) but I'm not really getting any worthy data.. I'm very lost. 我试图使用以下方法获取表内容: table = soup.find('table', attrs={'class': 'listing sortable'})但是我并没有真正得到任何有价值的数据。

Here's the screenshot 这是屏幕截图 在此处输入图片说明

import requests
import bs4

url = 'https://launchpad.net/~openshot.developers/+archive/ubuntu/ppa'
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, "html.parser")
tbody = soup.find_all(id='packages_list')[0].tbody

for tr in tbody.find_all('tr'):
    package = tr.find_all('td')[0].contents[2].strip()
    version = tr.find_all('td')[1].contents[0].strip()
    print('{0} - {1}'.format(package, version))
table = soup.find("table", id="packages_list")
row_data = []
for row in table.find_all("tr"):
    cols = row.find_all("td")
    cols = [ele.text.strip() for ele in cols]
    row_data.append(cols)

I'm not sure what results you're getting right now, but try this out! 我不确定您现在会得到什么结果,但是请尝试一下!

You can iterate tr tag and extract package and version : 您可以迭代tr标签并提取包和版本:

table = soup.find('table', attrs={'class': 'listing sortable'})
package = '' ; version = ''
for i in table.select('tr'):
    data = i.select('td')
    if data:
        package = data[0].text.strip()
        version = ' '.join(data[1].text.strip().split())
        print('{} : {} '.format(package,version))

#output
libopenshot : 0.1.4+0+588+107+201703310338+daily~ubuntu17.04.1 
libopenshot : 0.1.4+0+588+107+201703310338+daily~ubuntu15.04.1 
libopenshot : 0.1.4+0+588+107+201703310337+daily~ubuntu16.10.1 
libopenshot : 0.1.4+0+588+107+201703310337+daily~ubuntu16.04.1 
...
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM