簡體   English   中英

抓取JavaScript網址,但Selenium返回空字符串

[英]scraping javascript url but selenium returns empty string

我試圖打開,然后從標簽中包含的網址中抓取數據,如下所示:

<script src="http://includes.mpt-static.com/data/7CE5047496" type="text/javascript" charset="utf-8"></script>

我嘗試使用selenium檢索/打開url,但它只返回一個空白字符串。 我認為這是因為當我直接單擊src url時,會打開一個包含所需數據表的頁面。 但是,當我將網址復制並粘貼到瀏覽器中后,它返回空值。 另外,每次我重新加載頁面時,都會生成一個新的src url。 有人知道為什么會這樣嗎?

網址:查看源: http : //mypricetrack.com/amazon/B00N2BW2PK

我的代碼:

import time
from fake_useragent import UserAgent
import urllib2
import csv
from bs4 import BeautifulSoup
import json
from selenium import webdriver

#FAKE-USER_AGENT
ua = UserAgent(cache = False)
headers = {'User-Agent': ua.randome}


#SENDING REQUEST TO PRICETRACKER WEBSITE
product = 'B00N2BW2PK'
page = requests.get('http://www.mypricetrack.com/amazon/'+str(product), headers = headers)
soup = BeautifulSoup(page.text)
#print(soup.prettify())

#GETTING URL FOR DATA
data_link = []
for tag in soup.findAll('script',{'charset':'utf-8'}):
    data_link = data_link + [tag['src']]
string2 = data_link[1]
print string2
#OPENING URL FOR DATA

driver = webdriver.Firefox()
driver.get(string2)
time.sleep(5)
htmlSource = driver.page_source
print htmlSource

除非您使用正確的標頭“ Referer”進行請求,否則不會下載Javascript。

Selenium有點矯kill過正,您可以使用python請求獲得它:

import requests
import re
from bs4 import BeautifulSoup
# Emulate a browser with proper headers
session.headers.update({
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1664.3 Safari/537.36',
    'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Language':'en-US,en;q=0.8,es;q=0.6'
})
# Go to product page
product_page = 'http://mypricetrack.com/amazon/B00N2BW2PK'
res = session.get(product_page)
# find link
link = soup.find('script', {'src':re.compile('http://includes.mpt-static.com/data')})
link_src = link['src']
# Get you JS content
res = session.get(src, headers={'Referer':product_page}).text

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM