簡體   English   中英

python 網頁抓取請求和beautifulsoup

[英]python web scraping with requests and beautifulsoup

所以我想報廢psn商店。 特別是下面這個鏈接。 我試圖獲取游戲的數據和特價商品的價格。

https://store.playstation.com/#!/en-us/2-for-1/cid=STORE-MSF77008-PLAYCOLLMULTIBUY

r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")

我想要的數據是當您右鍵單擊網頁然后單擊檢查時。 例如,對於 Firewatch,它看起來像這樣。

< h3 class="cellTitle">Firewatch</h3>  
< li class="buyPrice ">$19.99</li>

現在當我打印出soup.prettify()我得到這個

html,body,div,span,applet,object,iframe,h1,h2,h3,h4,h5,h6,p,blockquote,pre,a,abbr,acronym,address,big,cite,code,del,dfn,em,img,ins,kbd,q,s,samp,small,strike,strong,sub,sup,tt,var,b,u,i,center,dl,dt,dd,ol,ul,li,fieldset,form,label,legend,table,caption,

沒有任何實際數據

我一定在這里的功能做錯了什么,但是我正在閱讀的指南和其他人的問題似乎都在做我所做的事情?

我查了一下這個網站有點難。 如果您使用瀏覽器檢查鏈接。 您將看到正在loading...文本。 當您實際發出請求時,您只會獲得頁面的這一部分,而實際上並未加載其他數據。 它是由javascript加載的。 也許你可以在這個網站上使用類似selenium解決方案。

在 phantomjs( http://phantomjs.org/download.html ) 和 Selenium 的幫助下,你可以做到這一點

步驟: 1. 在終端或 cmd 上使用命令:pip install selenium 2. 下載 phantomjs 並解壓縮它,然后將“phantomjs.exe”放在 python 路徑中,例如在 Windows 上,C:\\Python27

比使用此代碼它會給你想要的結果:

from  selenium import webdriver
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By


url="https://store.playstation.com/#!/en-us/2-for-1/cid=STORE-MSF77008-PLAYCOLLMULTIBUY"

driver = webdriver.PhantomJS()
driver.get(url)

element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CSS_SELECTOR, ".cellTitle")))


gamenames=driver.find_elements(By.CLASS_NAME,'cellTitle')

prices= driver.find_elements(By.CLASS_NAME,'buyPrice ')

links= driver.find_elements(By.CLASS_NAME,'permalink')


time.sleep(2)

if len(gamenames) == len(prices):
    for i in range(len(prices)):
        print "The Name of Game is :" + gamenames[i].text + " The Price for Which is : "+ prices[i].text + " The url for it is: " + links[i].get_attribute('href')
else:
    print "Parsing fail as Some data is not parsed properlly, Try Again"
driver.quit()

它將打印:

The Name of Game is :Yu-Gi-Oh! Legacy of the Duelist The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/yu-gi-oh-legacy-of-the-duelist/cid=UP0101-CUSA02718_00-YGOLEGACYOFDUELB
The Name of Game is :Firewatch The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/firewatch/cid=UP0146-CUSA04107_00-FIREWATCH0000000
The Name of Game is :The Escapists The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/the-escapists/cid=UP4064-CUSA01880_00-THEESCAPISTS0000
The Name of Game is :Oxenfree The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/oxenfree/cid=UP0962-CUSA04950_00-OXENBASEENUS0000
The Name of Game is :Duke Nukem 3D: 20th Anniversary World Tour The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/duke-nukem-3d-20th-anniversary-world-tour/cid=UP0292-CUSA04899_00-PAGODA0000000000
The Name of Game is :Primal Carnage: Extinction The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/primal-carnage-extinction/cid=UP0505-CUSA03371_00-PRIMALCARNAGE000
The Name of Game is :The Bunker The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/the-bunker/cid=UP4459-CUSA06057_00-THEBUNKERGAMEPS4
The Name of Game is :Shantae and the Pirate's Curse The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/shantae-and-the-pirate's-curse/cid=UP2053-CUSA01609_00-SHANTAECURSENA01
The Name of Game is :Pure Pool The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/pure-pool/cid=UP2070-CUSA00328_00-UPUREPOOL0000001
The Name of Game is :Banner Saga 2 The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/banner-saga-2/cid=UP0134-CUSA04444_00-THEBANNERSAGA2VE
The Name of Game is :Armello™ The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/armello/cid=UP1120-CUSA03300_00-00ARMELLOONESCEA
The Name of Game is :Gone Home: Console Edition The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/gone-home-console-edition/cid=UP1012-CUSA01228_00-GONEHOME00000000
The Name of Game is :Amplitude The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/amplitude/cid=UP8802-CUSA02480_00-HMXAMPLITUDE2015
The Name of Game is :Dangerous Golf™ The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/dangerous-golf/cid=UP1898-CUSA05385_00-TFEDANGEROUSGOLF
The Name of Game is :Pure Hold'em World Poker Championship The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/pure-hold'em-world-poker-championship/cid=UP2070-CUSA01104_00-UPUREPOKER000001
The Name of Game is :Hard Reset Redux The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/hard-reset-redux/cid=UP1050-CUSA04041_00-HARDRESET0000000
The Name of Game is :Lifeless Planet: Premier Edition The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/lifeless-planet-premier-edition/cid=UP0604-CUSA05475_00-LIFELESSPLANETPS
The Name of Game is :The Escapists: The Walking Dead The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/the-escapists-the-walking-dead/cid=UP4064-CUSA04182_00-THEESCAPISTSWD00
The Name of Game is :100ft Robot Golf The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/100ft-robot-golf/cid=UP0476-CUSA04678_00-100FTGAMEPS4SIEA
The Name of Game is :Kholat The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/kholat/cid=UP1561-CUSA04464_00-KHOLATGAME000000
The Name of Game is :Pure Chess® Complete Bundle The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/pure-chess-complete-bundle/cid=UP2070-CUSA00240_00-B000000000000337
The Name of Game is :Rogue Stormers The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/rogue-stormers/cid=UP4402-CUSA06052_00-ROGUESTORMERS000
The Name of Game is :SNOW Beta The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/snow-beta/cid=UP2862-CUSA06096_00-0000000000000001
The Name of Game is :Assault Suit Leynos The Price for Which is : $19.99The url for it is: https://store.playstation.com/#!/en-us/games/assault-suit-leynos/cid=UP4034-CUSA04727_00-ASLEYNOS00000000

希望這就是你要找的。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM