beautiful soup doesn't find all the tags

Question

I am trying to get the number of citations for a specific profile from Google Scholar. I use python and BeautifulSoup.

These elements are in the table citations indices . The code that I use returns only nine elements while there are more elements with the same tag when you click on the graph.

What's the problem?

from urllib import urlopen
from bs4 import BeautifulSoup
from lista_url import*
url='https://scholar.google.gr/citations?user=aFYdVoYAAAAJ&hl=el'#profile-   scholar
webpage=urlopen(url)
soup=BeautifulSoup(webpage)
for t in soup.findAll('span',{"class":"gsc_g_al"}):
        a=t.text
        print a

Answer 1

The larger citations table you appear to be looking for is loaded asynchronously using JavaScript (an AJAX request). You'll have to do this in your own code.

The URL for the AJAX request simply adds a view_op=citations_histogram parameter:

url='https://scholar.google.gr/citations?user=aFYdVoYAAAAJ&hl=el&view_op=citations_histogram'

This produces 24 entries:

>>> url='https://scholar.google.gr/citations?user=aFYdVoYAAAAJ&hl=el&view_op=citations_histogram'
>>> webpage=urlopen(url)
>>> soup=BeautifulSoup(webpage)
>>> len(soup.find_all('span', class_='gsc_g_al'))
24
>>> [el.string for el in soup.find_all('span', class_='gsc_g_al')]
[u'2', u'5', u'1', u'4', u'9', u'6', u'2', u'2', u'2', u'7', u'23', u'15', u'21', u'12', u'26', u'20', u'38', u'32', u'6', u'38', u'38', u'39', u'87', u'10']
>>> [el.string for el in soup.find_all('span', class_='gsc_g_t')]
[u'1992', u'1993', u'1994', u'1995', u'1996', u'1997', u'1998', u'1999', u'2000', u'2001', u'2002', u'2003', u'2004', u'2005', u'2006', u'2007', u'2008', u'2009', u'2010', u'2011', u'2012', u'2013', u'2014', u'2015']

beautiful soup doesn't find all the tags

Question

1 answers

solution1
1 ACCPTED 2015-02-17 13:47:09

beautiful soup doesn't find all the tags

Question

1 answers

solution1 1 ACCPTED 2015-02-17 13:47:09

solution1
1 ACCPTED 2015-02-17 13:47:09