简体   繁体   中英

How to fetch data from Skyscanner?

I am new to Python and there has been a request for grabbing the dynamic data from www.skyscanner.net .

Can someone guide me on doing so?

import requests
import lxml.html as lh

url = 'http://www.skyscanner.net/transport/flights/sin/lhr/131231/140220/'
response = requests.post(url)

tree = lh.document_fromstring(response.content)
print(tree);

All I did was to find the pattern in URL and attempt to grab from there. However, no data were successfully pulled. I learnt that Python was the best language in doing such task, but the library seems too huge and I do not know where to start form.

My name is Piotr - I work for Skyscanner - in Data Acquisition team - which I assume that you are applying to join :-) As this is a part of your task I wouldn't like to give you a straight answer , however you might consider:

  • Understand how our site works - how the requests are built and what data you can find in the http response.
  • You could use some libraries that will help you parsing xml/json responses

I think that's all I can say :-)

Cheers, piotr

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM