簡體   English   中英

使用Python和Selenium抓取Javascript文本

[英]Scraping Javascript Text with Python and Selenium

我正在嘗試從TripAdvisor餐館中獲取經度和緯度。 該信息未在網頁上突出顯示,但我確實在以下HTML中找到它:

緯度和經度(使用Javascript)

我正在嘗試使用此代碼提取所有信息:

#import libraries
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys

for i in range(0, 30, 30):
    #need this here for when you want more than 30
    while i <= range:
        i = str(i)
        #url format offsets the restaurants in increments of 30 after the oa
        url1 = 'https://www.tripadvisor.com/Restaurants-g294217-oa' + i + '-Hong_Kong.html#EATERY_LIST_CONTENTS'
        r1 = requests.get(url1)
        data1 = r1.text
        soup1 = BeautifulSoup(data1, "html.parser")
        for link in soup1.findAll('a', {'property_title'}):
            #print 'https://www.tripadvisor.com/Restaurant_Review-g294217-' + link.get('href')
            restaurant_url = 'https://www.tripadvisor.com/Restaurant_Review-g294217-' + link.get('href')
            browser = webdriver.Chrome('C:\Python27\Chromedriver\chromedriver.exe')
            # use xpath to get to the information in the JS
            print browser.find_element_by_xpath("""/html/body/script[22]""")

當我運行代碼時,它告訴我找不到元素。 也許我現在腦子有點死了,但是如果新鮮的眼睛可以看一下這個問題,讓我知道我做錯了還是有其他解決方法,那我就全神貫注了。

當您使用selenium webdriver程序時,沒有必要使用requestsBeautifulSoup軟件包,因為硒可以打開一個網頁(requests)並自行獲取內容(BeautifulSoup) 以下是您嘗試使用硒簡單地完成的工作的粗略結構。

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys


browser = webdriver.Chrome('C:\Python27\Chromedriver\chromedriver.exe')
for counter in range(0, 30, 30):
    #need this here for when you want more than 30
    while i <= counter:
        i = str(i)
        url1 = 'https://www.tripadvisor.com/Restaurants-g294217-oa' + i + '-Hong_Kong.html#EATERY_LIST_CONTENTS'
        browser.get(url1) # this will redirect to webpage
        # use xpath to get to the information in the JS
        print browser.find_element_by_xpath("""/html/body/script[22]""")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM