繁体   English   中英

从报纸网站提取网址时出错

[英]error while extracting url from newspaper website

我正在尝试从https://timesofindia.indiatimes.com/india下载网址。 然而得到错误。

这是我正在尝试的代码:-

urllist=[]
url=requests.get("https://timesofindia.indiatimes.com/india")
content=url.content
soup=BeautifulSoup(content,'lxml')
counter=0
for divtag in soup.find_all('div',{'class':'container wrapper clearfix'}):
    for ultag in divtag.find_all('ul',{'class':'list5 clearfix'}):
        if (counter<=30) :
            for litag in divtag.find_all('li'):
                counter=counter+1
                newurl='https://timesofindia.indiatimes.com/india'+litag.find('a')['href']
                urllist.append(newurl)

这是我得到的错误:-

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-67-a0e2d46fa0d4> in <module>
     10             for litag in divtag.find_all('li'):
     11                 counter=counter+1
---> 12                 newurl='https://timesofindia.indiatimes.com/india'+litag.find('a')['href']
     13                 urllist.append(newurl)

TypeError: 'NoneType' object is not subscriptable

任何人都可以建议如何纠正它?

似乎有些<li>标签不包含<a>标签,所以litag.find('a')['href']失败。 您需要检查这种可能性:

import requests
from bs4 import BeautifulSoup


urllist=[]
url=requests.get("https://timesofindia.indiatimes.com/india")
content=url.content
soup=BeautifulSoup(content,'lxml')

for li in soup.select('ul.list5.clearfix li:has(a)'):
    href = li.find('a')['href']
    if 'http' in href:
        urllist.append(href)
    else:
        urllist.append('https://timesofindia.indiatimes.com'+href)

for u in urllist:
    print(u)

# if you want only first 30 links:
# print(urllist[:30])

印刷:

https://timesofindia.indiatimes.com/india/ips-officers-commitment-to-service-will-inspire-youngsters-to-join-police-force-amit-shah/articleshow/77929114.cms
https://timesofindia.indiatimes.com/india/hope-to-see-comprehensive-peaceful-resolution-of-decade-long-syrian-conflict-india/articleshow/77929158.cms
https://timesofindia.indiatimes.com/india/on-indias-request-russia-reiterates-policy-of-no-arms-supply-to-pakistan/articleshow/77929028.cms
https://timesofindia.indiatimes.com/india/unease-in-bihars-ruling-alliance-ahead-of-assembly-polls/articleshow/77928891.cms
https://timesofindia.indiatimes.com/india/terrorist-killed-army-officer-inured-in-encounter-in-jk/articleshow/77928997.cms
https://timesofindia.indiatimes.com/india/bihar-elections-65-pending-bypolls-to-be-held-around-same-time-ec/articleshow/77928828.cms
https://timesofindia.indiatimes.com/india/rajnath-singh-likely-to-meet-chinese-defence-minister-in-moscow-this-evening/articleshow/77927857.cms
https://timesofindia.indiatimes.com/india/when-i-met-maoist-leader-ganapathy/articleshow/77910681.cms
https://timesofindia.indiatimes.com/india/mega-project-desi-rt-pcr-reagents-kits-launched-in-bluru/articleshow/77928163.cms
https://timesofindia.indiatimes.com/india/make-police-stations-centres-of-social-trust-pm-modi-to-ips-probationers/articleshow/77927976.cms
https://timesofindia.indiatimes.com/india/shiv-sena-defends-maharashtra-govt-over-ips-transfers-slams-bjp/articleshow/77927134.cms
https://timesofindia.indiatimes.com/india/bengaluru-riots-pre-planned-communally-motivated-fact-finding-report/articleshow/77927225.cms
https://timesofindia.indiatimes.com/india/find-solutions-to-problems-being-faced-by-youth-rahul-to-govt/articleshow/77927080.cms
https://timesofindia.indiatimes.com/india/why-the-opposition-is-protesting-the-scrapping-of-question-hour/articleshow/77922909.cms
https://timesofindia.indiatimes.com/india/sc-rejects-1984-riots-convict-sajjan-kumars-plea-seeking-interim-bail-on-health-ground/articleshow/77926819.cms
https://timesofindia.indiatimes.com/india/situation-along-china-border-serious-indian-army-taken-ample-precautionary-steps-army-chief-mm-naravane/articleshow/77925590.cms
https://timesofindia.indiatimes.com/india/india-looks-to-provide-fresh-impetus-to-ties-with-dhaka-with-waterways-vaccine/articleshow/77925190.cms
https://timesofindia.indiatimes.com/india/indias-covid-19-tally-goes-past-39-lakh-number-of-recoveries-crosses-30-lakh-mark/articleshow/77923135.cms
https://timesofindia.indiatimes.com/india/lac-face-off-india-steps-up-scrutiny-of-chinese-influence-group/articleshow/77924211.cms
https://timesofindia.indiatimes.com/india/two-wheeler-crash-deaths-more-than-double-in-a-decade/articleshow/77923795.cms
https://timesofindia.indiatimes.com/india/these-patients-have-borne-the-brunt-of-lockdown/articleshow/77904714.cms
https://timesofindia.indiatimes.com/india/encounter-breaks-out-between-militants-security-forces-in-jks-baramulla/articleshow/77922666.cms
https://timesofindia.indiatimes.com/india/males-account-for-81-accident-deaths/articleshow/77922362.cms
https://timesofindia.indiatimes.com/india/rail-board-gets-first-ceo-cum-chairman/articleshow/77922211.cms
https://timesofindia.indiatimes.com/india/deaths-due-to-heart-attacks-up-by-53-in-5-years-ncrb/articleshow/77922046.cms
https://timesofindia.indiatimes.com/india/some-in-g-23-wont-escalate-matters-give-leadership-time/articleshow/77922028.cms
https://timesofindia.indiatimes.com/india/netas-41-more-likely-to-respond-to-locals-than-migrants-study/articleshow/77922010.cms
https://timesofindia.indiatimes.com/india/kafeel-shifts-to-rajasthan-says-priyanka-assured-him-of-a-safe-stay-there/articleshow/77921983.cms
https://timesofindia.indiatimes.com/india/how-long-do-antibodies-of-covid-last-jurys-still-out/articleshow/77921936.cms
https://timesofindia.indiatimes.com/india/covid-19-11-7-lakh-tests-done-in-a-day-total-crosses-4-5-crore/articleshow/77921930.cms
https://timesofindia.indiatimes.com/india/covid-19-positive-sign-in-five-high-caseload-states/articleshow/77921757.cms
https://timesofindia.indiatimes.com/india/covid-19-over-83000-cases-india-sees-new-high-recoveries-cross-30-lakh/articleshow/77921767.cms
https://timesofindia.indiatimes.com/india/rajya-sabha-question-hour-suspended-six-times-in-the-past/articleshow/77921753.cms
https://timesofindia.indiatimes.com/india/pm-modis-donations-to-public-causes-exceed-rs-103-crore/articleshow/77921741.cms
https://timesofindia.indiatimes.com/india/we-are-not-partisan-facebook-tells-congress/articleshow/77921686.cms
https://timesofindia.indiatimes.com/india/end-contempt-of-court-provision-says-prashant-bhushan/articleshow/77921678.cms
https://timesofindia.indiatimes.com/india/must-plan-for-two-pronged-conflict-says-general-bipin-rawat/articleshow/77921605.cms
https://timesofindia.indiatimes.com/india/will-meet-wang-in-moscow-says-jaishankar/articleshow/77921583.cms
https://timesofindia.indiatimes.com/india/chinese-counterpart-wants-to-meet-rajnath-non-committal/articleshow/77920983.cms
https://timesofindia.indiatimes.com/india/pm-modis-twitter-account-linked-to-personal-site-hacked-restored/articleshow/77920860.cms
https://timesofindia.indiatimes.com/india/covid-19-india-only-country-in-worst-hit-20-with-cases-yet-to-peak/articleshow/77920640.cms
https://timesofindia.indiatimes.com/india/pm-modi-pitches-india-as-trusted-business-partner-at-us-india-strategic-partnership-forum/articleshow/77920570.cms
https://timesofindia.indiatimes.com/india/army-iaf-chiefs-visit-forward-areas-as-china-moves-more-forces-near-chushul/articleshow/77920284.cms
 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM