無法從網頁獲取表格

Question

我正在使用BeautifulSoup嘗試從此URL獲取所有2000家公司的整個表格：

https://www.forbes.com/global2000/list/#tab:overall 。

這是我編寫的代碼：

from bs4 import BeautifulSoup
import urllib.request

html_content = urllib.request.urlopen('https://www.forbes.com/global2000/list/#header:position')

soup = BeautifulSoup(html_content, 'lxml')
table = soup.find_all('table')[0]
new_table = pd.DataFrame(columns=range(0,7), index = [0])

row_marker = 0
for row in table.find_all('tr'):
   column_marker = 0
   columns = row.find_all('td')

   for column in columns:
      new_table.iat[row_marker,column_marker] = column.get_text()
      column_marker += 1
new_table

結果，我只得到列的名稱，而不是表本身。

我怎么能得到整個桌子。

Answer 1

內容是通過javascript生成的，因此您必須使用selenium來模仿瀏覽器並滾動運動，然后使用beautiful soup解析頁面源，或者在某些情況下，例如這樣，您可以通過查詢其ajax API來訪問這些值。：

import requests
import json

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:50.0) Gecko/20100101 Firefox/50.0'}

target = 'https://www.forbes.com/ajax/list/data?year=2017&uri=global2000&type=organization'

with requests.Session() as s:
    s.headers = headers
    data = json.loads(s.get(target).text)

print([x['name'] for x in data[:5]])

輸出（前5項） ：

['3M', '3i Group', '77 Bank', 'AAC Technologies Holdings', 'ABB']

無法從網頁獲取表格

問題描述

1 個解決方案

解決方案1
1 已采納 2018-04-17 22:30:21

無法從網頁獲取表格

問題描述

1 個解決方案

解決方案1 1 已采納 2018-04-17 22:30:21

解決方案1
1 已采納 2018-04-17 22:30:21