简体   繁体   中英

beautiful soup doesn't find all the tags

I am trying to get the number of citations for a specific profile from Google Scholar. I use python and BeautifulSoup.

These elements are in the table citations indices . The code that I use returns only nine elements while there are more elements with the same tag when you click on the graph.

What's the problem?

from urllib import urlopen
from bs4 import BeautifulSoup
from lista_url import*
url='https://scholar.google.gr/citations?user=aFYdVoYAAAAJ&hl=el'#profile-   scholar
webpage=urlopen(url)
soup=BeautifulSoup(webpage)
for t in soup.findAll('span',{"class":"gsc_g_al"}):
        a=t.text
        print a

The larger citations table you appear to be looking for is loaded asynchronously using JavaScript (an AJAX request). You'll have to do this in your own code.

The URL for the AJAX request simply adds a view_op=citations_histogram parameter:

url='https://scholar.google.gr/citations?user=aFYdVoYAAAAJ&hl=el&view_op=citations_histogram'

This produces 24 entries:

>>> url='https://scholar.google.gr/citations?user=aFYdVoYAAAAJ&hl=el&view_op=citations_histogram'
>>> webpage=urlopen(url)
>>> soup=BeautifulSoup(webpage)
>>> len(soup.find_all('span', class_='gsc_g_al'))
24
>>> [el.string for el in soup.find_all('span', class_='gsc_g_al')]
[u'2', u'5', u'1', u'4', u'9', u'6', u'2', u'2', u'2', u'7', u'23', u'15', u'21', u'12', u'26', u'20', u'38', u'32', u'6', u'38', u'38', u'39', u'87', u'10']
>>> [el.string for el in soup.find_all('span', class_='gsc_g_t')]
[u'1992', u'1993', u'1994', u'1995', u'1996', u'1997', u'1998', u'1999', u'2000', u'2001', u'2002', u'2003', u'2004', u'2005', u'2006', u'2007', u'2008', u'2009', u'2010', u'2011', u'2012', u'2013', u'2014', u'2015']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM