[英]error while extracting url from newspaper website
我正在尝试从https://timesofindia.indiatimes.com/india下载网址。 然而得到错误。
这是我正在尝试的代码:-
urllist=[]
url=requests.get("https://timesofindia.indiatimes.com/india")
content=url.content
soup=BeautifulSoup(content,'lxml')
counter=0
for divtag in soup.find_all('div',{'class':'container wrapper clearfix'}):
for ultag in divtag.find_all('ul',{'class':'list5 clearfix'}):
if (counter<=30) :
for litag in divtag.find_all('li'):
counter=counter+1
newurl='https://timesofindia.indiatimes.com/india'+litag.find('a')['href']
urllist.append(newurl)
这是我得到的错误:-
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-67-a0e2d46fa0d4> in <module>
10 for litag in divtag.find_all('li'):
11 counter=counter+1
---> 12 newurl='https://timesofindia.indiatimes.com/india'+litag.find('a')['href']
13 urllist.append(newurl)
TypeError: 'NoneType' object is not subscriptable
任何人都可以建议如何纠正它?
似乎有些<li>
标签不包含<a>
标签,所以litag.find('a')['href']
失败。 您需要检查这种可能性:
import requests
from bs4 import BeautifulSoup
urllist=[]
url=requests.get("https://timesofindia.indiatimes.com/india")
content=url.content
soup=BeautifulSoup(content,'lxml')
for li in soup.select('ul.list5.clearfix li:has(a)'):
href = li.find('a')['href']
if 'http' in href:
urllist.append(href)
else:
urllist.append('https://timesofindia.indiatimes.com'+href)
for u in urllist:
print(u)
# if you want only first 30 links:
# print(urllist[:30])
印刷:
https://timesofindia.indiatimes.com/india/ips-officers-commitment-to-service-will-inspire-youngsters-to-join-police-force-amit-shah/articleshow/77929114.cms
https://timesofindia.indiatimes.com/india/hope-to-see-comprehensive-peaceful-resolution-of-decade-long-syrian-conflict-india/articleshow/77929158.cms
https://timesofindia.indiatimes.com/india/on-indias-request-russia-reiterates-policy-of-no-arms-supply-to-pakistan/articleshow/77929028.cms
https://timesofindia.indiatimes.com/india/unease-in-bihars-ruling-alliance-ahead-of-assembly-polls/articleshow/77928891.cms
https://timesofindia.indiatimes.com/india/terrorist-killed-army-officer-inured-in-encounter-in-jk/articleshow/77928997.cms
https://timesofindia.indiatimes.com/india/bihar-elections-65-pending-bypolls-to-be-held-around-same-time-ec/articleshow/77928828.cms
https://timesofindia.indiatimes.com/india/rajnath-singh-likely-to-meet-chinese-defence-minister-in-moscow-this-evening/articleshow/77927857.cms
https://timesofindia.indiatimes.com/india/when-i-met-maoist-leader-ganapathy/articleshow/77910681.cms
https://timesofindia.indiatimes.com/india/mega-project-desi-rt-pcr-reagents-kits-launched-in-bluru/articleshow/77928163.cms
https://timesofindia.indiatimes.com/india/make-police-stations-centres-of-social-trust-pm-modi-to-ips-probationers/articleshow/77927976.cms
https://timesofindia.indiatimes.com/india/shiv-sena-defends-maharashtra-govt-over-ips-transfers-slams-bjp/articleshow/77927134.cms
https://timesofindia.indiatimes.com/india/bengaluru-riots-pre-planned-communally-motivated-fact-finding-report/articleshow/77927225.cms
https://timesofindia.indiatimes.com/india/find-solutions-to-problems-being-faced-by-youth-rahul-to-govt/articleshow/77927080.cms
https://timesofindia.indiatimes.com/india/why-the-opposition-is-protesting-the-scrapping-of-question-hour/articleshow/77922909.cms
https://timesofindia.indiatimes.com/india/sc-rejects-1984-riots-convict-sajjan-kumars-plea-seeking-interim-bail-on-health-ground/articleshow/77926819.cms
https://timesofindia.indiatimes.com/india/situation-along-china-border-serious-indian-army-taken-ample-precautionary-steps-army-chief-mm-naravane/articleshow/77925590.cms
https://timesofindia.indiatimes.com/india/india-looks-to-provide-fresh-impetus-to-ties-with-dhaka-with-waterways-vaccine/articleshow/77925190.cms
https://timesofindia.indiatimes.com/india/indias-covid-19-tally-goes-past-39-lakh-number-of-recoveries-crosses-30-lakh-mark/articleshow/77923135.cms
https://timesofindia.indiatimes.com/india/lac-face-off-india-steps-up-scrutiny-of-chinese-influence-group/articleshow/77924211.cms
https://timesofindia.indiatimes.com/india/two-wheeler-crash-deaths-more-than-double-in-a-decade/articleshow/77923795.cms
https://timesofindia.indiatimes.com/india/these-patients-have-borne-the-brunt-of-lockdown/articleshow/77904714.cms
https://timesofindia.indiatimes.com/india/encounter-breaks-out-between-militants-security-forces-in-jks-baramulla/articleshow/77922666.cms
https://timesofindia.indiatimes.com/india/males-account-for-81-accident-deaths/articleshow/77922362.cms
https://timesofindia.indiatimes.com/india/rail-board-gets-first-ceo-cum-chairman/articleshow/77922211.cms
https://timesofindia.indiatimes.com/india/deaths-due-to-heart-attacks-up-by-53-in-5-years-ncrb/articleshow/77922046.cms
https://timesofindia.indiatimes.com/india/some-in-g-23-wont-escalate-matters-give-leadership-time/articleshow/77922028.cms
https://timesofindia.indiatimes.com/india/netas-41-more-likely-to-respond-to-locals-than-migrants-study/articleshow/77922010.cms
https://timesofindia.indiatimes.com/india/kafeel-shifts-to-rajasthan-says-priyanka-assured-him-of-a-safe-stay-there/articleshow/77921983.cms
https://timesofindia.indiatimes.com/india/how-long-do-antibodies-of-covid-last-jurys-still-out/articleshow/77921936.cms
https://timesofindia.indiatimes.com/india/covid-19-11-7-lakh-tests-done-in-a-day-total-crosses-4-5-crore/articleshow/77921930.cms
https://timesofindia.indiatimes.com/india/covid-19-positive-sign-in-five-high-caseload-states/articleshow/77921757.cms
https://timesofindia.indiatimes.com/india/covid-19-over-83000-cases-india-sees-new-high-recoveries-cross-30-lakh/articleshow/77921767.cms
https://timesofindia.indiatimes.com/india/rajya-sabha-question-hour-suspended-six-times-in-the-past/articleshow/77921753.cms
https://timesofindia.indiatimes.com/india/pm-modis-donations-to-public-causes-exceed-rs-103-crore/articleshow/77921741.cms
https://timesofindia.indiatimes.com/india/we-are-not-partisan-facebook-tells-congress/articleshow/77921686.cms
https://timesofindia.indiatimes.com/india/end-contempt-of-court-provision-says-prashant-bhushan/articleshow/77921678.cms
https://timesofindia.indiatimes.com/india/must-plan-for-two-pronged-conflict-says-general-bipin-rawat/articleshow/77921605.cms
https://timesofindia.indiatimes.com/india/will-meet-wang-in-moscow-says-jaishankar/articleshow/77921583.cms
https://timesofindia.indiatimes.com/india/chinese-counterpart-wants-to-meet-rajnath-non-committal/articleshow/77920983.cms
https://timesofindia.indiatimes.com/india/pm-modis-twitter-account-linked-to-personal-site-hacked-restored/articleshow/77920860.cms
https://timesofindia.indiatimes.com/india/covid-19-india-only-country-in-worst-hit-20-with-cases-yet-to-peak/articleshow/77920640.cms
https://timesofindia.indiatimes.com/india/pm-modi-pitches-india-as-trusted-business-partner-at-us-india-strategic-partnership-forum/articleshow/77920570.cms
https://timesofindia.indiatimes.com/india/army-iaf-chiefs-visit-forward-areas-as-china-moves-more-forces-near-chushul/articleshow/77920284.cms
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.