Using python regular expression to find an image path

Question

I have a variable like the one below:

var = '<img src="path_1"><p>Words</p><img src="path_2>'

Its a string, but inside is obviously html elements. How do I get the first path only (ie path_1) using a regex?

I am trying something like this:

match = re.match(r'src=\"[\w-]+\"', var)
print match.group(0)

I get this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

Any help is appreciated.

Answer 1

You should use an HTML parser like BeautifulSoup :

>>> from bs4 import BeautifulSoup
>>> var = '<img src="path_1"><p>Words</p><img src="path_2>'
>>> soup = BeautifulSoup(var, "html.parser")
>>> soup.img["src"]
'path_1'

As for the regex-approach, you need to make the following changes to make it work:

switch to re.search() , re.match() starts matching from the beginning of the string
add a capturing group to capture the src value
there is no need to escape double quotes

Fixed version:

>>> re.search(r'src="([\w-]+)"', var).group(1)
'path_1'

Answer 2

As suggested in the comments, use search() since match() will try to match your regular expression from the beginning of the string. You can also use capturing a named group to make the code more readable:

var = '<img src="path_1"><p>Words</p><img src="path_2>'
import re
match = re.search(r'src=\"(?P<path1>[\w-]+)\"', var)
if match:
    print(match.group('path1'))

Output:

path_1

Answer 3

Try,

path1= re.search(r'<img\s+src="(.*?)"><p>',var).group(1) # path_1

BeutifulSoup is convenient. But very slow.
HTMLParser is a lot faster. But using it is painful.
re is the fastest option and in my opinion, for stateless usecases it's worth it.

If the target text is stateful , ie lots of nesting and capturing the semantics is important, instead of implementing a state machine eg a parser use an available parser. I would strongly suggest lxml for parsing HTML and XML. It is a little bit less convenient than bs4 but comparable to re in speed.

Using python regular expression to find an image path

Question

3 answers

solution1
5 ACCPTED 2016-04-26 15:08:00

solution2
2 2016-04-26 15:10:15

solution3
1 2016-04-26 15:36:34

Using python regular expression to find an image path

Question

3 answers

solution1 5 ACCPTED 2016-04-26 15:08:00

solution2 2 2016-04-26 15:10:15

solution3 1 2016-04-26 15:36:34

solution1
5 ACCPTED 2016-04-26 15:08:00

solution2
2 2016-04-26 15:10:15

solution3
1 2016-04-26 15:36:34