How to extract image url with python?

Question

I'm trying to extract image URLs from this code:

<div class="theme-screenshot one attachment-theme-screenshot size-theme-screenshot wp-post-image loaded" data-featured-src="https://websitedemos.net/wp-content/uploads/2019/07/outdoor-adventure-02-home.jpg" data-src="https://websitedemos.net/wp-content/uploads/2019/07/outdoor-adventure-02-home.jpg" style='background-image: url("https://websitedemos.net/wp-content/uploads/2019/07/outdoor-adventure-02-home.jpg");'></div>

How can I find the URLs in data-src?

I'm using beautiful soup and find function but I have no idea how to extract links because I don't see img tag as usual...

Thank you for your time in advance

Answer 1

You can try the following:

from bs4 import BeautifulSoup

html = """
<div class="theme-screenshot one attachment-theme-screenshot size-theme-screenshot wp-post-image loaded" data-featured-src="https://websitedemos.net/wp-content/uploads/2019/07/outdoor-adventure-02-home.jpg" data-src="https://websitedemos.net/wp-content/uploads/2019/07/outdoor-adventure-02-home.jpg" style='background-image: url("https://websitedemos.net/wp-content/uploads/2019/07/outdoor-adventure-02-home.jpg");'></div>
"""
soup = BeautifulSoup(html, "html.parser")
url = soup.select_one(
    "div.theme-screenshot.one.attachment-theme-screenshot.size-theme-screenshot.wp-post-image.loaded"
).get("data-src")

print(url)

This will return:

https://websitedemos.net/wp-content/uploads/2019/07/outdoor-adventure-02-home.jpg

Documentation for BeautifulSoup(bs4) can be found at:

https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Answer 2

If you can't use an HTML parser for whatever reason, then you can use regex.

import re

text = '''
<div class="theme-screenshot one attachment-theme-screenshot size-theme-screenshot wp-post-image loaded" data-featured-src="https://websitedemos.net/wp-content/uploads/2019/07/outdoor-adventure-02-home.jpg" data-src="https://websitedemos.net/wp-content/uploads/2019/07/outdoor-adventure-02-home.jpg" style='background-image: url("https://websitedemos.net/wp-content/uploads/2019/07/outdoor-adventure-02-home.jpg");'></div>
'''

parsed = re.search('(?<=data-src=").*(?=" )', text).group(0)

print(parsed)

How to extract image url with python?

Question

2 answers

solution1
1 2022-07-29 15:58:48

solution2
1 2022-07-29 16:15:24

How to extract image url with python?

Question

2 answers

solution1 1 2022-07-29 15:58:48

solution2 1 2022-07-29 16:15:24

solution1
1 2022-07-29 15:58:48

solution2
1 2022-07-29 16:15:24