简体   繁体   English

网页抓取获取下拉菜单数据python

[英]Web scrape get drop-down menu data python

I am trying to get a list of all countries in the webpage https://www.nexmo.com/products/sms .我正在尝试获取网页https://www.nexmo.com/products/sms中所有国家/地区的列表。 I see the list is displayed in the drop-down.我看到列表显示在下拉列表中。 After inspecting the page, I tried the following code but I must be doing something wrong.检查页面后,我尝试了以下代码,但我一定是做错了什么。 I would appreciate some help here.我会很感激这里的一些帮助。

import requests
from bs4 import BeautifulSoup
# collect and parse page
page = requests.get('https://www.nexmo.com/products/sms')
soup = BeautifulSoup(page.text, 'html.parser')
# pull all text from the div
name_list = soup.find(class_ ='dropdown-content')
print(name_list)

This webpage uses JavaScript to render the HTML.此网页使用 JavaScript 来呈现 HTML。 You can render it with Selenium.您可以使用 Selenium 渲染它。 First install Selenium.首先安装硒。

sudo pip3 install selenium

Then get a driver https://sites.google.com/a/chromium.org/chromedriver/downloads (Depending upon your OS you may need to specify the location of your driver)然后获取驱动程序https://sites.google.com/a/chromium.org/chromedriver/downloads (根据您的操作系统,您可能需要指定驱动程序的位置)

from selenium import webdriver
from bs4 import BeautifulSoup

browser = webdriver.Chrome()
url = ('https://www.nexmo.com/products/sms')
browser.get(url)
html_source = browser.page_source
browser.quit()
soup = BeautifulSoup(html_source, 'html.parser')
for name_list in soup.find_all(class_ ='dropdown-row'):
    print(name_list.text)

Outputs:输出:

Afghanistan
Albania
...
Zambia
Zimbabwe

UPDATED更新

Alternatively use PyQt5:或者使用 PyQt5:

On Ubuntu在 Ubuntu 上

sudo apt-get install python3-pyqt5
sudo apt-get install python3-pyqt5.qtwebengine

Other OS:其他操作系统:

pip3 install PyQt5

Then run:然后运行:

from bs4 import BeautifulSoup
import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView


class Render(QWebEngineView):
    def __init__(self, url):
        self.html = None
        self.app = QApplication(sys.argv)
        QWebEngineView.__init__(self)
        self.loadFinished.connect(self._loadFinished)
        self.load(QUrl(url))
        self.app.exec_()

    def _loadFinished(self, result):
        self.page().toHtml(self.callable)

    def callable(self, data):
        self.html = data
        self.app.quit()

url = 'https://www.nexmo.com/products/sms'
html_source = Render(url).html
soup = BeautifulSoup(html_source, 'html.parser')
for name_list in soup.find_all(class_ ='dropdown-row'):
    print(name_list.text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM