Beautiful soup - how to extract a string from an object

Question

I am learning Beautiful soup. I have succeeded in tracking down the html lines that I need. My next step is to extract an Id value from those lines.

The code to find the lines looks like this:

object = soup_station.find('img',{'src': re.compile("^Controls")})

If I now print object I will get this, for example:

<img src="Controls/RiverLevels/ChartImage.jpg?Id=471&amp;ChartType=Histogram" id="StationDetails_Chart1_chartImage" alt="Current river level" />

The part I want to extract in the line above is the "471" after Id= .

I tried using re.search on object but it seems that object is not text.

Any help would be much appreciated!

Answer 1

You can adapt the following:

s = '<img src="Controls/RiverLevels/ChartImage.jpg?Id=471&amp;ChartType=Histogram" id="StationDetails_Chart1_chartImage" alt="Current river level" />'

from bs4 import BeautifulSoup
import re
from urlparse import urlsplit, parse_qs


soup = BeautifulSoup(s)
# find the node with a src starting with Controls
node = soup.find('img',{'src': re.compile("^Controls")})
# Break up the url in the src attribute
url_split = urlsplit(node['src'])
# Parse out the query parameter from the url
qs = parse_qs(url_split.query)
# Display the value for `Id`
print qs['Id'][0]

Answer 2

You want to make sure that you are performing the regex search on the object's source. You can give this a try:

import re
ele = soup_station.find('img')
src = ele['src']

match = re.search(r'\?Id=(\d+)', src)
ele_id = match.group(1)

Beautiful soup - how to extract a string from an object

Question

2 answers

solution1
0 ACCPTED 2013-06-18 21:18:26

solution2
0 2013-06-18 21:23:38

Beautiful soup - how to extract a string from an object

Question

2 answers

solution1 0 ACCPTED 2013-06-18 21:18:26

solution2 0 2013-06-18 21:23:38

solution1
0 ACCPTED 2013-06-18 21:18:26

solution2
0 2013-06-18 21:23:38