problems scraping web page using python

Question

Hi I'm quite new to python and my boss has asked me to scrape this data however it is not my strong point so i was wondering how i would go about this.

The text that I'm after also changes in the quote marks every few minutes so I'm also not sure how to locate that.

I am using beautiful soup at the moment and Lxml however if there are better alternatives I'm happy to try them

This is the inspected element of the webpage:

div class = "sometext"
<h3> somemoretext </h3>
<p>
<span class = "title" title="text i want">text i want</span>
<br>
</p>

I have tried using:

from lxml import html
import requests
from bs4 import BeautifulSoup
page = requests.get('the url')
soup = BeautifulSoup(page.text)
r = soup.findAll('//span[@class="title"]/text()')
print r

Thank you in advance,any help would be appreciated!

Answer 1

perhaps find is the method you really need since you're only ever looking for one element. docs

r = soup.find('div', 'sometext').find('span','title')['title']

Answer 2

First do this to get what you are looking at in the soup:

soup = BeautifulSoup(page)
print soup

That way you can double check that you are actually dealing will what you think you are dealing with.

Then do this:

r = soup.findAll('span', attrs={"class":"title"})
for span in r:
    print span.text

This will get all the span tags with a class=title , and then text will print out all the text in between the tags.

Edited to Add

Note that esecules' answer will get you the title within the tag ( <span class = "title" title="text i want"> ) whereas mine will get the title from the text ( <span class = "title" >text i want</span> )

Answer 3

if you're familiar with XPath and you don't need feature that specific to BeautifulSoup , then using lxml only is enough (or maybe even better since lxml is known to be faster) :

from lxml import html
import requests

page = requests.get('the url')
root = html.fromstring(page.text)
r = root.xpath('//span[@class="title"]/text()')
print r

problems scraping web page using python

Question

3 answers

solution1
0 2015-12-08 22:49:43

solution2
0 ACCPTED 2015-12-08 22:52:59

solution3
0 2015-12-09 01:43:35

problems scraping web page using python

Question

3 answers

solution1 0 2015-12-08 22:49:43

solution2 0 ACCPTED 2015-12-08 22:52:59

solution3 0 2015-12-09 01:43:35

solution1
0 2015-12-08 22:49:43

solution2
0 ACCPTED 2015-12-08 22:52:59

solution3
0 2015-12-09 01:43:35