简体   繁体   English

如何使用VBA从网站抓取数据?

[英]How to scrape data from website using VBA?

I'm a newbie at VBA and was trying to extract flight prices from Expedia (SG to Bangkok) for some practice. 我是VBA的新手,正在尝试提取从Expedia(新加坡到曼谷)的机票价格。 My code is not working out too well, unfortunately. 不幸的是,我的代码工作得不太好。 It only returned one price (which I have no idea where it came from). 它只返回一个价格(我不知道它来自哪里)。 Would really appreciate it if anyone could help me out. 如果有人可以帮助我,我将非常感激。 Thank you! 谢谢!

Sub ExtractRaw() 
Dim wb As Workbook 
Dim ws As Worksheet 
Set wb = ThisWorkbook 
Set ws = wb.Sheets("Sheet1") 

Dim ie As Object 
Set ie = CreateObject("InternetExplorer.Application") 
With ie 
.Visible = True .navigate "https://www.expedia.com.sg/Flights-Search?rfrr=TG.LP.SrchWzd.Flight&langid=2057&trip=OneWay&leg1=from:Singapore,%20Singapore%20(SIN-Changi),to:Bangkok,%20Thailand%20(BKK-Suvarnabhumi%20Intl.),departure:" & DateAdd("d", 1, Date) & "TANYT&passengers=children:0,adults:1,seniors:0,infantinlap:Y&options=cabinclass:economy,sortby:price,carrier:&mode=search&paandi=true" 

Do While ie.Busy 
DoEvents 
Loop 

Dim doc As HTMLDocument 
Set doc = ie.document 
While ie.readyState <> 4 
Wend 
On Error Resume Next 

Dim i As Integer For i = 0 To 200 
Range("A1").Value = ie.document.getElementById("flight-listing-container").getElementsByClassName("dollars price-emphasis")(i).innerText 

Next i 
ie.Quit 
Application.EnableEvents = True 
End With 
End Sub

Your issue lies here I think: 我认为您的问题就在这里:

Dim i As Integer 
    For i = 0 To 200 
    Range("A1").Value = ie.document.getElementById("flight-listing-container").getElementsByClassName("dollars price-emphasis")(i).innerText 
Next i 

You're looping an arbitrary 200 times and then repeatedly updating cell A1 with the price. 您要循环200次,然后使用价格重复更新单元格A1。 This means you'll always be left with the inner text of the last element that matched in A1 这意味着您将始终剩下与A1中匹配的最后一个元素的内部文本

Try 尝试

Range("A1").Offset(i,0).Value = ie.document.getElementById("flight-listing-container").getElementsByClassName("dollars price-emphasis")(i).innerText

That'll give you a list of all inner text it finds on that elment down the A column until your loop terminates. 这将为您提供在A列下方找到的所有内部文本的列表,直到循环终止。

Really, I think you should be determining how many iterations you'll need to loop before diving into the loop, unless you've got a good reason to loop 200 times each time. 确实,我认为您应该确定进入循环之前需要循环多少次迭代,除非您有充分的理由每次都要循环200次。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM