简体   繁体   English

使用BeautifulSoup在DIV标签下刮擦IMG SRC

[英]Scrape IMG SRC under DIV tag Using BeautifulSoup

I'm trying to get the src for an image that resides under a Div tag from. 我正在尝试获取来自Div标签下的图像的src。 My code gives me an error, KeyError: 'src' 我的代码给我一个错误,KeyError:'src'

EndGadget.com博客页面上的HTML 10

Here's my code: 这是我的代码:

for page in range(1,4):
# code that gets dynamic URL
url = sys.argv[1] + "{}".format(page)
print(url)
html=urlopen(url)
soup=BeautifulSoup(html,"lxml")

for article in soup.find_all('article',class_='o-hit'):
    div=soup.find('div',{"class":"o-rating_thumb@m-"})
    img_src = div.find('img').attrs['src']
    #img_src = article.find('div',class_ ='o-rating_thumb c-white').img['src']   
    headline = article.h2.text.strip()

    summary = article.find('p',class_ ='mt-15@m+ t-d5@m- t-d5@tp+ c-gray-3').text.strip()

    #img_src = "none"

    print(headline)
    print(summary)
    print(img_src)
    csv_writer.writerow([headline,summary,img_src])

The web page is here: EndGadget Blog page 10 网页在这里: EndGadget Blog页面10

For the top most news item on each page, you can get the image source from the 'src' attribute itself. 对于每个页面上最重要的新闻,您可以从'src'属性本身获取图像源。

You can first navigate to the div in which the image is contained using find() method. 您可以先使用find()方法导航到包含图像的div。 Next within that div you can find the img tag and get its source from its attributes . 接下来,在该div中,您可以找到img标签并从其属性获取其来源。

import requests
from bs4 import BeautifulSoup
url='https://www.engadget.com/reviews/latest/page/10/'
res=requests.get(url)
soup=BeautifulSoup(res.text,'html.parser')
div=soup.find('div',{"class":"o-rating_thumb@m-"})
print(div.find('img').attrs['src'])

Output: 输出:

https://o.aolcdn.com/images/dims?resize=810%2C455&crop=810%2C455%2C0%2C0&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1400%252C933%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C1066%26image_uri%3Dhttp%253A%252F%252Fo.aolcdn.com%252Fhss%252Fstorage%252Fmidas%252F85a4e2b124ba329ab520e80e306f07eb%252F206517051%252FIMG_5243e.jpg%26client%3Da1acac3e1b3290917d92%26signature%3Dcea6158d0bf02768d31ee67f2694be6cafaf200c&client=amp-blogside-v2&signature=08a97a1109f1c3287c6766fa284104c6f78770fe

Edit to scrap all news sources of a page: 编辑以剪贴页面的所有新闻来源:

Even though the first image has an attribute src , to scrap the subsequent images we have to use the attribute data-originals (you can check the page source and find this out). 即使第一个图像具有属性src ,也要使用后续的图像来擦除后续图像,我们都必须使用属性data-originals (您可以检查页面源代码并进行查找)。 I think this is why you are getting an AttributeError 我认为这就是为什么您收到AttributeError的原因

I was able to scrap all the news items like this 这样我就可以报废所有新闻了

import requests
from bs4 import BeautifulSoup
url='https://www.engadget.com/reviews/latest/page/10/'
res=requests.get(url)
soup=BeautifulSoup(res.text,'html.parser')
articles=soup.find_all('article',{"class":"o-hit"})
for article in articles:
    print("Heading: ", article.find('h2').text.strip())#heading
    print("Summary: ", article.find('p').text.strip())#summary
    print("Image Source:", article.find('img').attrs['data-original'])#image src
    print()

Output: 输出:

Heading:  Netflix will remove user reviews from its website next month
Summary:  Last year five-star ratings got the ax, and now written reviews will fade away too.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fs.aolcdn.com%2Fhss%2Fstorage%2Fmidas%2F884e68f9a829f3a26db5b729f00ccd03%2F206508290%2FEnglish.jpg&client=amp-blogside-v2&signature=b37eb21e95cef8cebe1f3c741b8bb29eb3471dcc

Heading:  Smart ForTwo Electric Drive quick spin review
Summary:  The saddest way to spend $25,000.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fs.aolcdn.com%2Fhss%2Fstorage%2Fmidas%2Fedbdfdfeff2e77567cd0c4a73484d108%2F206502307%2Fsmartfortwo.jpg&client=amp-blogside-v2&signature=a9fc05d80d4b4d8ba6ef33453510c138632bab81

Heading:  Vivo's all-screen NEX S is a frustrating glimpse of the future
Summary:  Spoiler alert: It's really cool, but don't bother importing one.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F29%2F5b36ac0e523dc352bd46785a%2F5b36aedc884c2354eb33d663_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=725c8033196a2ae3500e2144830d14b03e7abc0e

Heading:  Sonos Beam review: Smart features trump minor audio compromises
Summary:  Bringing the soundbar into the smart home era.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F27%2F5b32f579523dc352bd3f66f3%2F5b32fbf2884c2354eb33d62f_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=4ad311aeb5cb23907fd99ec12d962b148646163d

Heading:  BlackBerry KEY2 review: The undisputed keyboard king
Summary:  This is the best Android-powered BlackBerry, if that means anything to you.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F26%2F5b3188ee523dc36212a7ff02%2F5b318be5802b94347b7e586b_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=5438cdf814480be5856d38db73695f86ade186ea

Heading:  Amazon Echo Look review: Good selfie taker, so-so stylist
Summary:  An AI is no match for my style instincts.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F25%2F5b30cbfce880db6107cb7ad0%2F5b30cde61aa5fc22c7bbf187_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=308e9f00afcb968da05823ce0d0718ccc6e43cb4

Heading:  Mitsubishi’s Outlander Plug-In Hybrid is an understated surprise
Summary:  Mitsubishi is back, even though it actually never left.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F21%2F5b2bc80f523dc36212a2be79%2F5b2bc8a6884c2319c410c008_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=a00b8466fa281051de4d64b1223fe99f97315985

Heading:  Amazon Fire TV Cube review: Alexa still needs work as a TV guide
Summary:  This device was bound to be made at some point, but is it worth it?
Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F21%2F5b2bb81edbaab36faf00ed0e%2F5b2bddfb884c2319c410c00c_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=baa2db64e12d013ab712d823238fc3efeee693f8

Heading:  HTC U12+ review: Fundamentally flawed
Summary:  The phone's pressure-sensitive power and volume keys are kinda the worst.
Image Source: https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-06%2F21%2F5b28cd94f50775726418990a%2F5b2bd7d4b46ab33c496c1607_1920x1080_U_v1.jpg&client=amp-blogside-v2&signature=8518ce5c141fb85b935794fbd3bd283d32508484
from bs4 import BeautifulSoup
import requests
import time



for page in range(1,11):

    url = 'https://www.engadget.com/reviews/latest/page/%s/' %(page)
    time.sleep(10)

    print ('Page: %s' %(page))    
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')

    articles = soup.find_all('article',class_='o-hit')

    for article in articles:

        img_src = article.find('div',class_ ='o-rating_thumb c-white').img['data-original'] 
        headline = article.h2.text.strip()
        summary = article.find('p',class_ ='mt-15@m+ t-d5@m- t-d5@tp+ c-gray-3').text

        print(headline)
        print(summary)
        print(img_src)
        print('\n')

Output: Which you can just write to csv 输出:您可以将其写入csv

Page: 1
Surface Studio 2 review: A better all-in-one PC twist
But Microsoft could still go further.
https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-12%2F20%2F5c1bfc61c0e0af2854a7c103%2F5c1bfcaa3278fb29ca5cf249_o_U_v1.jpg&client=amp-blogside-v2&signature=3c4be6997ee8e877ee7f62ad8d52409232f02ce9


Nikon Z6 review: The best full-frame mirrorless camera for video
10-bit external video, in-body stabilization and a full sensor readout.
https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1600%252C1111%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C1111%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-12%252Ff152c7e0-045b-11e9-bfc7-4d357297511c%26client%3Da1acac3e1b3290917d92%26signature%3Dd3865a04724a29f29b2bd3f6941dcddf9d494bcc&client=amp-blogside-v2&signature=7a72e03b6995fc31e3415c68279a4b038c979ea8


Brava's light-powered smart oven is too expensive to make sense
Preset cook programs can be limiting as well. 
https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-12%2F19%2F5c1a9bf4fcd67b52d409586b%2F5c1a9f935b5d1b6ddb3a80d6_o_U_v1.png&client=amp-blogside-v2&signature=5f899a54c84b54b651d90824b67587f29677c858


PlayStation Classic review: A disappointing dose of nostalgia
Sony learned nothing from Nintendo.
https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1600%252C928%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C928%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-12%252F815fbe10-fd95-11e8-bde6-bdfd52a1c25a%26client%3Da1acac3e1b3290917d92%26signature%3D01ffdb2c7bc74497ae5f2a734feab08629996703&client=amp-blogside-v2&signature=98ae8e659929f4dd7f97d886d00b65303ab18059


Moment's 58mm lens is a portrait machine
The company's new tele lens fixes everything that was wrong with its 2014 model
https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fresize%3D2000%252C2000%252Cshrink%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-12%252F39511ab0-f8d1-11e8-bbae-a119d499ba30%26client%3Da1acac3e1b3290917d92%26signature%3Dc246fb260ca480d6a9a4acb6f91cf10974f32c9a&client=amp-blogside-v2&signature=7dd3cc98cb10045b323dba54e86f2c70c2aa99b4


’Super Smash Bros. Ultimate’ is the perfect nostalgia bomb
It's a must-own for every Nintendo Switch owner.
https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1600%252C900%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C900%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-12%252Fd8ab34b0-f90d-11e8-befe-815318929941%26client%3Da1acac3e1b3290917d92%26signature%3D4a5feb09e95c4f55ad0e1f8d6322734588ff76f6&client=amp-blogside-v2&signature=3a61128cae6560efafec7efc14211c2187851234


Mercedes’ GLE sports impressive suspension technology
MBUX and the new E-Active Body Control suspension enhance an already splendid SUV.
https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fresize%3D2000%252C2000%252Cshrink%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-12%252Fc86bc1f0-f7fa-11e8-b77f-844a0908350f%26client%3Da1acac3e1b3290917d92%26signature%3D26e4fe19d43ad8d8f349edc95baf1790e10deecd&client=amp-blogside-v2&signature=f74436bc107af53e468d2d9ece4e88b37cff0a10


Mighty Vibe review: A much improved iPod Shuffle for Spotify
The second-gen model makes some much-needed improvements.
https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1600%252C1067%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C1067%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-11%252F1ffaaf40-f4e9-11e8-bbdf-b9d9c8fe5ee1%26client%3Da1acac3e1b3290917d92%26signature%3D18cbcc4c82d1c7d9f542dc80c57a4486c318526a&client=amp-blogside-v2&signature=e6d3badcf43ac938173f5353aa3a750c82b72bc3


Google Pixel Slate review: The burden of bad software
Back to the drawing board, Google.
https://o.aolcdn.com/images/dims?thumbnail=386%2C217&quality=80&image_uri=https%3A%2F%2Fimg.vidible.tv%2Fprod%2F2018-11%2F30%2F5c008cf7600c9a1890e1305b%2F5c008d483a4f8c07678d8eb0_o_U_v1.jpg&client=amp-blogside-v2&signature=bbe180eb62cfe43f5241e74c1b7328c70da134c9


Page: 2
Dolby Dimension review: Excellent sound, exorbitant price
At $599, these headphones are too expensive for most, no matter how good they are.
https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1600%252C1067%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C1067%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-11%252Fc6b58350-f250-11e8-8fff-afce4122ee12%26client%3Da1acac3e1b3290917d92%26signature%3D1bbe23e198c73bba2cd252a57f0598bcc32b374b&client=amp-blogside-v2&signature=65f2ee40f3cb346e473d6b188a6939317b4bebad


Nikon Z7 review: Great photos, great video, imperfect autofocus
It’s a strong full-frame mirrorless debut.

https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1600%252C1019%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C1019%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-11%252F07cabcc0-ed6f-11e8-af6d-2e14e29f20d0%26client%3Da1acac3e1b3290917d92%26signature%3D0a9704430483d81de5ebc6d4df50feca2228634e&client=amp-blogside-v2&signature=acf075d9ba15703a43abfcb705bd2ccd73e39ac7


All of Amazon's new Echo speakers reviewed
So how good do the new Echo Plus, Dot and Sub really sound?
https://o.aolcdn.com/images/dims?thumbnail=300%2C200&quality=80&image_uri=https%3A%2F%2Fo.aolcdn.com%2Fimages%2Fdims%3Fcrop%3D1600%252C1067%252C0%252C0%26quality%3D85%26format%3Djpg%26resize%3D1600%252C1067%26image_uri%3Dhttps%253A%252F%252Fs.yimg.com%252Fos%252Fcreatr-uploaded-images%252F2018-11%252Fa1d1ba20-ed16-11e8-b9ad-ed849065b748%26client%3Da1acac3e1b3290917d92%26signature%3D55c2bb811c59d95f942f200aec6af5fb35d6e0fc&client=amp-blogside-v2&signature=3a5c7fdde7829033245cec14439a5463c077d702

... ... ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM