简体   繁体   English

如何从(javascript?)网站进行网络抓取?

[英]How to web-scraping from a (javascript?) website?

I tried to web-scraping dataes from a website called : flightradar24我试图从一个名为: flightradar24的网站抓取数据

With my code, I'm looking for the name of the airport, and I want to web-scraping the "Arrivals" table.使用我的代码,我正在寻找机场的名称,并且我想通过网络抓取“Arrivals”表。 The web-scraping the name is worked, because this is only a h1 HTML format, but if I try to web-scraping this table with my code, I don't get any values, I get only the objects names (maybe because there is a javascript?)网络抓取名称是有效的,因为这只是一个 h1 HTML 格式,但是如果我尝试用我的代码网络抓取这个表,我没有得到任何值,我只得到对象名称(可能是因为有是javascript吗?)

Is there any solution which one I can web-scraping this part of this page?有什么解决方案可以让我在网页上抓取此页面的这一部分吗? (Python 2.7) (Python 2.7)

I tried this :我试过这个:

import urllib2, sys
from BeautifulSoup import BeautifulSoup

site= "https://www.flightradar24.com/data/airports/bud/arrivals"
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(site,headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
name = soup.find('h1' , attrs={'class' : 'airport-name'})
print name

table = soup.find('div', { "class" : "row cnt-schedule-table" })
print table

I get this, when i want to print the table :我明白了,当我想打印表格时:

 <div class="row cnt-schedule-table"><label class="mbm">ARRIVALS</label><table class="table table-condensed table-hover data-table mnt-15"><thead><tr class="hidden-xs hidden-sm"><th class="w-80">TIME</th><th class="w-90">FLIGHT</th><th>FROM</th><th>AIRLINE</th><th class="w-120">AIRCRAFT</th><th class="w-10"></th><th class="w-160">STATUS</th></tr><tr ng-cloak="ng-cloak" data-ng-class="{hidden: btnLoadEarlier === false}" ng-show="(isFetching == false &amp;&amp; airportView.schedule.arrivals.data.length &gt; 0)"> 0)"&gt;<td colspan="7" class="text-center"><button data-mode="arrivals" data-page="-1" data-timestamp="{{currentUtcTimestampRender / 1000}}" ng-click="loadMoreFlights($event)" data-current-page="{{airportView.schedule.arrivals.page.current}}" data-loading-text='&lt;i class="fa fa-circle-o-notch fa-spin"&gt;&lt;/i&gt; Loading earlier flights...' class="btn btn-table-action btn-flights-load">Load earlier flights</button></td></tr></thead><tbody><tr ng-cloak="ng-cloak" class="loader" ng-show="(isFetching == true)"><td colspan="7" class="text-center"><i class="fa fa-spinner fa-pulse"></i> Loading...</td></tr><tr ng-cloak="ng-cloak" ng-show="(isFetching == false &amp;&amp; airportView.schedule.arrivals.data.length == 0)"><td colspan="7" class="text-center">Sorry, we don't have any information about flights for this airport</td></tr><tr ng-cloak="ng-cloak" class="hidden-md hidden-lg" ng-repeat="objFlight in airportView.schedule.arrivals.data track by $index" ng-show="(isFetching == false)"><td colspan="7" class="state-block-{{objFlight.flight.status.generic.status.color || 'gray'}}"><div class="row"><div class="col-xs-12 col-sm-12 p-xxs"><span ng-bind-html="objFlight.flight.statusMessage.text | unsafe"></span> {{objFlight.flight.status.generic.eventTime.utc * 1000 || '' | date: timeFormat: timeZone}}</div></div><div class="row"><div class="col-xs-3 col-sm-3 p-xxs"><i class="fa fa-clock-o"></i> <span>{{objFlight.flight.time.scheduled.arrival * 1000 || '-' | date: timeFormat : timeZone}}</span></div><div class="col-xs-3 col-sm-3 p-xxs"><i class="fa fa-tag"></i> <a class="notranslate" ng-href="/data/flights/{{objFlight.flight.identification.number.default | lowercase}}">{{objFlight.flight.identification.number.default}}</a></div><div class="col-xs-6 col-sm-6 p-xxs"><i class="fa fa-map-marker"></i> <span ng-bind-html="objFlight.flight.airport.origin.position.region.city || '-' | unsafe">{{objFlight.flight.airport.origin.position.region.city}} </span><a class="notranslate" ng-href="/data/airports/{{objFlight.flight.airport.origin.code.iata | lowercase}}" title="{{objFlight.flight.airport.origin.name}}, {{objFlight.flight.airport.origin.position.country.name}}">({{objFlight.flight.airport.origin.code.iata}})</a></div></div><div class="row"><div class="col-xs-3 col-sm-3 p-xxs" title="{{objFlight.flight.aircraft.model.text || ''}}"><i class="fa fa-plane"></i> {{objFlight.flight.aircraft.model.code || '-'}}</div><div class="col-xs-3 col-sm-3 p-xxs"><a ng-show="(objFlight.flight.aircraft.registration)" class="notranslate" ng-href="/data/aircraft/{{objFlight.flight.aircraft.registration | lowercase}}">{{objFlight.flight.aircraft.registration}}</a></div><div class="col-xs-6 col-sm-6 p-xxs">{{ objFlight.flight.airline.name || '-'}}</div></div></td></tr><tr ng-cloak="ng-cloak" class="hidden-xs hidden-sm" ng-repeat="objFlight in airportView.schedule.arrivals.data track by $index" ng-show="(isFetching == false)" data-date="{{(objFlight.flight.time.scheduled.arrival * 1000) | date: 'EEEE, MMM dd' : timeZone}}" tbl-render-directive="tbl-render-directive"><td>{{objFlight.flight.time.scheduled.arrival * 1000 || '-' | date: timeFormat : timeZone}}</td><td class="pls cell-flight-number"><a class="chevron-toggle" ng-if="(objFlight.flight.identification.codeshare != null)" data-codeshare="{{objFlight.flight.identification.codeshare}}"></a> <a class="notranslate" ng-href="/data/flights/{{objFlight.flight.identification.number.default | lowercase}}">{{objFlight.flight.identification.number.default}}</a></td><td><div ng-show="(objFlight.flight.airport.origin)"><span class="hide-mobile-only">{{objFlight.flight.airport.origin.position.region.city}} </span><a class="fs-10 fbold notranslate" ng-href="/data/airports/{{objFlight.flight.airport.origin.code.iata | lowercase}}" title="{{objFlight.flight.airport.origin.name}}, {{objFlight.flight.airport.origin.position.country.name}}">({{objFlight.flight.airport.origin.code.iata}})</a></div><div ng-show="!(objFlight.flight.airport.origin)">-</div></td><td ng-bind-html=" objFlight.flight.airline.name || '-' | unsafe" title="{{ objFlight.flight.airline.name || ''}}" class="cell-airline"></td><td><span class="notranslate" ng-show="(objFlight.flight.aircraft.model.code)">{{objFlight.flight.aircraft.model.code}} </span><a ng-show="(objFlight.flight.aircraft.registration)" class="fs-10 fbold notranslate" ng-href="/data/aircraft/{{objFlight.flight.aircraft.registration | lowercase}}">({{objFlight.flight.aircraft.registration}}) </a><span ng-if="(!objFlight.flight.aircraft.model.code &amp;&amp; !objFlight.flight.aircraft.registration)">-</span></td><td><div class="state-block {{objFlight.flight.status.generic.status.color || 'gray'}}"></div></td><td><span ng-bind-html="objFlight.flight.statusMessage.text | unsafe"></span> {{objFlight.flight.status.generic.eventTime.utc * 1000 || '' | date: timeFormat: timeZone}}</td></tr></tbody><tfoot><tr ng-cloak="ng-cloak" data-ng-class="{hidden: btnLoadLater === false }" ng-show="(isFetching == false &amp;&amp; airportView.schedule.arrivals.data.length &gt; 0 &amp;&amp; airportView.schedule.arrivals.page.current &lt; airportView.schedule.arrivals.page.total)"> 0 &amp;&amp; airportView.schedule.arrivals.page.current &lt; airportView.schedule.arrivals.page.total)"&gt;<td colspan="7" class="text-center"><button data-mode="arrivals" data-page="2" data-timestamp="{{currentUtcTimestampRender / 1000 | int}}" ng-click="loadMoreFlights($event)" data-current-page="{{airportView.schedule.arrivals.page.current}}" data-loading-text='&lt;i class="fa fa-circle-o-notch fa-spin"&gt;&lt;/i&gt; Loading later flights...' class="btn btn-table-action btn-flights-load">Load later flights</button></td></tr><tr ng-cloak="ng-cloak" ng-show="(isFetching == false)"><td colspan="7">* All times are in {{(airportView.schedule.arrivals.data &amp;&amp; timeZone.toUpperCase() == 'UTC' ? 'UTC' : 'local')}} timezone</td></tr></tfoot></table></div>

The articel code in the answer doesn't working :答案中的文章代码不起作用:

 import urllib2 from bs4 import BeautifulSoup import json # new url url = 'https://www.flightradar24.com/data/airports/bud/arrivals' # read all data page = urllib2.urlopen(url).read() # convert json text to python dictionary data = json.loads(page) print(data['row cnt-schedule-table'])

Here is another stack overflow article that has a solution with a very similar problem. 是另一篇堆栈溢出文章,它有一个非常相似问题的解决方案。 It seems that you need to change the URL to match the one rendered rather than the one you would normally use in a browser.您似乎需要更改 URL 以匹配呈现的 URL,而不是您通常在浏览器中使用的 URL。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM