[英]Python Attribute Error: 'NoneType' object has no attribute 'find_all'
[英]ERROR: 'NoneType' object has no attribute 'find_all'
我正在做 web 抓取一個 web 頁面,名為:CVE Trends
import bs4, requests,webbrowser
LINK = "https://cvetrends.com/"
PRE_LINK = "https://nvd.nist.gov/"
response = requests.get(LINK)
response.raise_for_status()
soup=bs4.BeautifulSoup(response.text,'html.parser')
div_tweets=soup.find('div',class_='tweet_text')
a_tweets=div_tweets.find_all('a')
link_tweets =[]
for a_tweet in a_tweets:
link_tweet= str(a_tweet.get('href'))
if PRE_LINK in link_tweet:
link_tweets.append(link_tweet)
from pprint import pprint
pprint(link_tweets)
這是我到目前為止編寫的代碼。 我已經嘗試了很多方法,但它總是給出同樣的錯誤:
'NoneType' object 沒有屬性 'find_all'
有人能幫助我嗎? 我真的需要這個。 提前感謝您的任何回答。
這是因為soup.find("div", class_="tweet_text")
沒有找到任何東西,所以它返回None
。 發生這種情況是因為您嘗試抓取的站點是使用 javascript 填充的,因此當您向該站點發送獲取請求時,您會得到以下結果:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<meta content="width=device-width, initial-scale=1, shrink-to-fit=no" name="viewport"/>
<title>
CVE Trends - crowdsourced CVE intel
</title>
<meta content="Monitor real-time, crowdsourced intel about trending CVEs on Twitter." name="description"/>
<meta content="trending CVEs, CVE intel, CVE trends" name="keywords"/>
<meta content="CVE Trends - crowdsourced CVE intel" name="title" property="og:title">
<meta content="Simon Bell" name="author"/>
<meta content="website" property="og:type">
<meta content="https://cvetrends.com/images/cve-trends.png" name="image" property="og:image">
<meta content="https://cvetrends.com" property="og:url">
<meta content="Monitor real-time, crowdsourced intel about trending CVEs on Twitter." property="og:description"/>
<meta content="en_GB" property="og:locale"/>
<meta content="en_US" property="og:locale:alternative"/>
<meta content="CVE Trends" property="og:site_name"/>
<meta content="summary_large_image" name="twitter:card"/>
<meta content="@SimonByte" name="twitter:creator"/>
<meta content="CVE Trends - crowdsourced CVE intel" name="twitter:title"/>
<meta content="Monitor real-time, crowdsourced intel about trending CVEs on Twitter." name="twitter:description"/>
<meta content="https://cvetrends.com/images/cve-trends.png" name="twitter:image"/>
<link href="https://cvetrends.com/favicon.ico" id="favicon" rel="icon" sizes="32x32"/>
<link href="https://cvetrends.com/apple-touch-icon.png" id="apple-touch-icon" rel="apple-touch-icon"/>
<link href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/5.1.0/css/bootstrap.min.css" rel="stylesheet"/>
</meta>
</meta>
</meta>
</meta>
</head>
<body>
<div id="root">
</div>
<noscript>
Please enable JavaScript to run this app.
</noscript>
<script src="https://cvetrends.com/js/main.d0aa7136854f54748577.bundle.js">
</script>
</body>
</html>
您可以使用print(soup.prettify())
驗證這一點。
為了能夠抓取該站點,您可能必須使用 Selenium 之類的東西。
這是由於沒有得到您真正想要的響應。
這個網站有java-script加載的內容,所以你不會得到請求的數據。
而不是抓取網站,您將從https://cvetrends.com/api/cves/24hrs獲取數據
這是一些解決方案:
import requests
import json
from urlextract import URLExtract
LINK = "https://cvetrends.com/api/cves/24hrs"
PRE_LINK = "https://nvd.nist.gov/"
link_tweets = []
# library for url extraction
extractor = URLExtract()
# ectract response from LINK (json Response)
html = requests.get(LINK).text
# convert string to json object
twitt_json = json.loads(html)
twitt_datas = twitt_json.get('data')
for twitt_data in twitt_datas:
# extract tweets
twitts = twitt_data.get('tweets')
for twitt in twitts:
# extract tweet texts and validate condition
twitt_text = twitt.get('tweet_text')
if PRE_LINK in twitt_text:
# find urls from text
urls_list = extractor.find_urls(twitt_text)
for url in urls_list:
if PRE_LINK in url:
link_tweets.append(twitt_text)
print(link_tweets)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.