简体   繁体   中英

How can I programmatically scrape an image from another website?

A few years ago I helped someone put together a webpage (for local personal use only, not served to the world) that aggregates outdoor webcam photos from several of his favorite websites. It's a time-saver for viewing multiple websites at once. We had it easy when the images on those websites had fixed URLs. And we were able to write some JavaScript code when the URLs changed predictably (eg, when the url had a date it in). But now he'd like to add an image whose filename changes seemingly at random and I don't know how to handle that. Basically, I'd like to:

  1. Programmatically visit another website to find the URL of a particular image.
  2. Insert that URL into my webpage with an <img> tag.

I realize this is probably a confusing and unusual question. I'm willing to help clarify as much as possible. I'm just not sure how to ask for what this guy wants to do.

Update: David Dorward mentioned that doing this with JavaScript violates the Same Origin Policy . I'm open to suggestions for other ways to approach this problem.

Its probably a big fat violation of copyright.

The picture is most like containered within a page - just regularly visit that page and parse the img tag. Make sure that the random bit you commented on is not just a random parameter to force browsers to fetch the fresh image instead of retrieving a cached version.

  1. Fetch html of remote page using Cross Domain AJAX .
  2. Then parse it to get urls of images of interest.
  3. Then for each url do <img src=url />

如果你在你的项目中使用php,你可以使用CURL库获取另一个网站内容,并使用正则表达式解析它从源代码获取图像URL。

You have a Python question in your profile, so I'll just say if I were trying to do this, I'd go with Python & Beautiful Soup . Has the added advantage of being able to handle invalid HTML.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM