How to scrape this using Beautiful Soup in Python?

Question

<a href="http://www.chrisstucchio.com/blog/2012/bandit_algorithms_vs_ab.html">Why Multi-armed Bandit algorithms are superior to A/B testing (with Math)</a>, <a href="user?id=yummyfajitas">yummyfajitas</a>, <a href="item?id=4060658">11 comments</a>,

How to scrape through an html page that has the above written html as content and get data out like this:

link = http://www.chrisstucchio.com/blog/2012/bandit_algorithms_vs_ab.html
text = Why Multi-armed Bandit algorithms are superior to A/B testing (with Math)
user_id = yummyfajitas
item_id = 4060658

Answer 1

If it's in this exact same order every time:

html = r'<a href="http://www.chrisstucchio.com/blog/2012/bandit_algorithms_vs_ab.html">Why Multi-armed Bandit algorithms are superior to A/B testing (with Math)</a>, <a href="user?id=yummyfajitas">yummyfajitas</a>, <a href="item?id=4060658">11 comments</a>, '

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(html) #sort the html
bowl = soup.findAll('a') #find all links in the html

link = bowl[0]['href'] #find the first 'a' tags href
text = bowl[0].contents[0] #find the first tags url
user_id = bowl[1]['href'].split('?id=')[1] #split on '?id=' and take the second  value. could be [-1] too
item_id = bowl[2]['href'].split('?id=')[1]

print 'link:', link
print 'text:', text
print 'user_id:', user_id
print 'item_id:', item_id

How to scrape this using Beautiful Soup in Python?

Question

1 answers

solution1
1 ACCPTED 2012-06-03 16:27:36

How to scrape this using Beautiful Soup in Python?

Question

1 answers

solution1 1 ACCPTED 2012-06-03 16:27:36

solution1
1 ACCPTED 2012-06-03 16:27:36