简体   繁体   中英

Scrape the dynamic content using python scrapy

I would like to scrape the 'calendar' content in this link: https://gomore.dk/lejebil/27035

我要的日历信息

I wonder if i could use python scrapy without using selenium to crawl this content. As i cant find any info from the network tab. Thanks!

after half day research and i noticed i could use scrapy-splash to retrieve the JS-processed content, which gimme the full content of the webpage, including the calendar information. However, the calendar information is not tally with the expected. eg hour 1 for weekday1 should be "danger" but it is not.

The webpage use hour to represent 24 hours each day, and data-weekday 0 - 6 to represent sunday, monday, ..., saturday. And class="danger" to represent calendar is blocked (eg red color)

   <tr data-hour="0">
      <td class="hour">
        <div>
          <small>00.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="1">
      <td class="hour">
        <div>
          <small>01.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="2">
      <td class="hour">
        <div>
          <small>02.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="3">
      <td class="hour">
        <div>
          <small>03.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="4">
      <td class="hour">
        <div>
          <small>04.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="5">
      <td class="hour">
        <div>
          <small>05.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="6">
      <td class="hour">
        <div>
          <small>06.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="7">
      <td class="hour">
        <div>
          <small>07.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="8">
      <td class="hour">
        <div>
          <small>08.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="9">
      <td class="hour">
        <div>
          <small>09.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="10">
      <td class="hour">
        <div>
          <small>10.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="11">
      <td class="hour">
        <div>
          <small>11.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="12">
      <td class="hour">
        <div>
          <small>12.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4" class="danger"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="13">
      <td class="hour">
        <div>
          <small>13.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="14">
      <td class="hour">
        <div>
          <small>14.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="15">
      <td class="hour">
        <div>
          <small>15.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="16">
      <td class="hour">
        <div>
          <small>16.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="17">
      <td class="hour">
        <div>
          <small>17.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend danger"></td>
    </tr>

    <tr data-hour="18">
      <td class="hour">
        <div>
          <small>18.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend"></td>
    </tr>

    <tr data-hour="19">
      <td class="hour">
        <div>
          <small>19.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend"></td>
    </tr>

    <tr data-hour="20">
      <td class="hour">
        <div>
          <small>20.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend"></td>
    </tr>

    <tr data-hour="21">
      <td class="hour">
        <div>
          <small>21.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend"></td>
    </tr>

    <tr data-hour="22">
      <td class="hour">
        <div>
          <small>22.00</small>
        </div>
      </td>
      <td data-weekday="1"></td>
      <td data-weekday="2" class="danger"></td>
      <td data-weekday="3" class="danger"></td>
      <td data-weekday="4"></td>
      <td data-weekday="5" class="danger"></td>
      <td data-weekday="6" class="cal-weekend danger"></td>
      <td data-weekday="0" class="cal-weekend"></td>
    </tr>

By any chance the rendered HTML from scrapy-splash can go wrong? The rest of the content seems correct except this calendar table.

https://dgaqgnnkkz5ef.cloudfront.net/assets/application-840c6707422c9d0ee7fb9005972e7c7201803d9c24bbcd23253e6ec7beedd6a1.js这是他们正在获取数据的 JS 文件,但他们没有时间做更多的研究,我不能做更多的研究对于js-occupancy-calendarrental_ad_occupancy_calendar/main ,你会有一些想法。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM