網頁抓取獲取下拉菜單數據python

Question

我正在嘗試獲取網頁https://www.nexmo.com/products/sms中所有國家/地區的列表。 我看到列表顯示在下拉列表中。 檢查頁面后，我嘗試了以下代碼，但我一定是做錯了什么。 我會很感激這里的一些幫助。

import requests
from bs4 import BeautifulSoup
# collect and parse page
page = requests.get('https://www.nexmo.com/products/sms')
soup = BeautifulSoup(page.text, 'html.parser')
# pull all text from the div
name_list = soup.find(class_ ='dropdown-content')
print(name_list)

Answer 1

此網頁使用 JavaScript 來呈現 HTML。 您可以使用 Selenium 渲染它。 首先安裝硒。

sudo pip3 install selenium

然后獲取驅動程序https://sites.google.com/a/chromium.org/chromedriver/downloads （根據您的操作系統，您可能需要指定驅動程序的位置）

from selenium import webdriver
from bs4 import BeautifulSoup

browser = webdriver.Chrome()
url = ('https://www.nexmo.com/products/sms')
browser.get(url)
html_source = browser.page_source
browser.quit()
soup = BeautifulSoup(html_source, 'html.parser')
for name_list in soup.find_all(class_ ='dropdown-row'):
    print(name_list.text)

輸出：

Afghanistan
Albania
...
Zambia
Zimbabwe

更新

或者使用 PyQt5：

在 Ubuntu 上

sudo apt-get install python3-pyqt5
sudo apt-get install python3-pyqt5.qtwebengine

其他操作系統：

pip3 install PyQt5

然后運行：

from bs4 import BeautifulSoup
import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView


class Render(QWebEngineView):
    def __init__(self, url):
        self.html = None
        self.app = QApplication(sys.argv)
        QWebEngineView.__init__(self)
        self.loadFinished.connect(self._loadFinished)
        self.load(QUrl(url))
        self.app.exec_()

    def _loadFinished(self, result):
        self.page().toHtml(self.callable)

    def callable(self, data):
        self.html = data
        self.app.quit()

url = 'https://www.nexmo.com/products/sms'
html_source = Render(url).html
soup = BeautifulSoup(html_source, 'html.parser')
for name_list in soup.find_all(class_ ='dropdown-row'):
    print(name_list.text)

網頁抓取獲取下拉菜單數據python

問題描述

1 個解決方案

解決方案1
0 已采納 2018-10-31 21:09:25

網頁抓取獲取下拉菜單數據python

問題描述

1 個解決方案

解決方案1 0 已采納 2018-10-31 21:09:25

解決方案1
0 已采納 2018-10-31 21:09:25