[英]Web scraping issue while scraping from one table using beautifulsoup
import requests
from bs4 import BeautifulSoup
page = requests.get('http://www.freejobalert.com/ap-govt-jobs/144586/')
c = page.content
soup = BeautifulSoup(c,"html5lib")
row = soup.find_all("table")[0].find_all('tr')
dict = {}
for i in row:
for title in i.find_all('span', attrs={'style':'color: #008000;'}):
dict['Title'] = title.text
for link in i.find_all('a',title=True, href=True):
dict['Link'] = link['href']
print(dict)
在這里,我得到的數據為空:
我期望:
{'Link': 'http://www.freejobalert.com/wp-content/uploads/2018/08/Detailed-Notification-Directorate-of-Public-Health-Family-Welfare-Vijayawada-Civil-Assistant-Surgeon-Posts.pdf', 'Title': 'Detailed Notification'}
{'Link': 'http://www.freejobalert.com/wp-content/uploads/2018/08/Notification-Directorate-of-Public-Health-Family-Welfare-Vijayawada-Civil-Assistant-Surgeon-Posts.pdf', 'Title': 'Notification '}
{'Link': 'http://cfw.ap.nic.in/', 'Title': ' Official Website'}
在這里,我只從第一個表中抓取數據。 但這給了我所有表的數據。我只想要第一個表的重要鏈接。 但這給了我兩個。 請看一下我的代碼。
我測試了您的代碼,它對我來說運行正常,但是我將dict的名稱更改為some_dict,如下所示:
import requests
from bs4 import BeautifulSoup
page = requests.get('http://www.freejobalert.com/ap-govt-jobs/144586/')
c = page.content
soup = BeautifulSoup(c,"html5lib")
row = soup.find_all("table")[0].find_all('tr')
some_dict = {}
for i in row:
for title in i.find_all('span', attrs={'style': 'color: #008000;'}):
some_dict['Title'] = title.text
for link in i.find_all('a', title=True, href=True):
some_dict['Link'] = link['href']
print(some_dict)
由於它掩蓋了Python的內置dict類。 我的輸出是:
{'Title': 'Detailed Notification', 'Link': 'http://www.freejobalert.com/wp-content/uploads/2018/08/Detailed-Notification-Directorate-of-Public-Health-Family-Welfare-Vijayawada-Civil-Assistant-Surgeon-Posts.pdf'}
{'Title': 'Notification ', 'Link': 'http://www.freejobalert.com/wp-content/uploads/2018/08/Notification-Directorate-of-Public-Health-Family-Welfare-Vijayawada-Civil-Assistant-Surgeon-Posts.pdf'}
{'Title': ' Official Website', 'Link': 'http://cfw.ap.nic.in/'}
如果將dict重命名為其他名稱,它可以運行嗎?
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.