简体   繁体   English

如何使用网页抓取查找图片链接

[英]how to find image links by using webscraping

I want to parse the image links of webpages.I have tried the below code but its showing some error.我想解析网页的图像链接。我尝试了以下代码,但显示一些错误。

#!usr/bin/python
import requests
from bs4 import BeautifulSoup
url=raw_input("enter website")
r=requests.get("http://"+ url)
data=r.img
soup=BeautifulSoup(data)
for link in soup.find_all('img'):
    print link.get('src')

error错误

File "img.py", line 6, in <module>
    data=r.img
AttributeError: 'Response' object has no attribute 'img'

you error is that you want to get img from Response , not from source code你的错误是你想从Response获取img ,而不是从source code

r=requests.get("http://"+ url)
# data=r.img # it is wrong

# change instead of `img` to `text`
data = r.text # here we need to get `text` from `Response` not `img`

# and the code
soup=BeautifulSoup(data)
for link in soup.find_all('img'):
    print link.get('src')

Below you will find a working version with import urllib.request and BeautifulSoup :下面你会发现一个带有import urllib.requestBeautifulSoup的工作版本:

import urllib.request
from bs4 import BeautifulSoup

url='http://python.org'
with urllib.request.urlopen(url) as response:
  html = response.read()

soup = BeautifulSoup(html, 'html.parser')

for link in soup.find_all('img'):
  print('relative img path')
  print(link['src'])
  print('absolute path')
  print(url + link['src'])

I hope this helps you :-)我希望这可以帮助你 :-)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM