美麗的湯沒有找到所有的標簽

Question

我正在嘗試從Google學術搜索獲取特定個人資料的引用次數。 我使用python和BeautifulSoup。

這些元素在表引用索引中 。 當您單擊圖形時，我使用的代碼僅返回9個元素，而有更多具有相同標簽的元素。

有什么問題？

from urllib import urlopen
from bs4 import BeautifulSoup
from lista_url import*
url='https://scholar.google.gr/citations?user=aFYdVoYAAAAJ&hl=el'#profile-   scholar
webpage=urlopen(url)
soup=BeautifulSoup(webpage)
for t in soup.findAll('span',{"class":"gsc_g_al"}):
        a=t.text
        print a

Answer 1

您似乎要查找的較大的引文表是使用JavaScript（AJAX請求）異步加載的。 您必須在自己的代碼中執行此操作。

AJAX請求的URL只需添加一個view_op=citations_histogram參數：

url='https://scholar.google.gr/citations?user=aFYdVoYAAAAJ&hl=el&view_op=citations_histogram'

這將產生24個條目：

>>> url='https://scholar.google.gr/citations?user=aFYdVoYAAAAJ&hl=el&view_op=citations_histogram'
>>> webpage=urlopen(url)
>>> soup=BeautifulSoup(webpage)
>>> len(soup.find_all('span', class_='gsc_g_al'))
24
>>> [el.string for el in soup.find_all('span', class_='gsc_g_al')]
[u'2', u'5', u'1', u'4', u'9', u'6', u'2', u'2', u'2', u'7', u'23', u'15', u'21', u'12', u'26', u'20', u'38', u'32', u'6', u'38', u'38', u'39', u'87', u'10']
>>> [el.string for el in soup.find_all('span', class_='gsc_g_t')]
[u'1992', u'1993', u'1994', u'1995', u'1996', u'1997', u'1998', u'1999', u'2000', u'2001', u'2002', u'2003', u'2004', u'2005', u'2006', u'2007', u'2008', u'2009', u'2010', u'2011', u'2012', u'2013', u'2014', u'2015']

美麗的湯沒有找到所有的標簽

問題描述

1 個解決方案

解決方案1
1 已采納 2015-02-17 13:47:09

美麗的湯沒有找到所有的標簽

問題描述

1 個解決方案

解決方案1 1 已采納 2015-02-17 13:47:09

解決方案1
1 已采納 2015-02-17 13:47:09