简体   繁体   中英

How to get the full html with Beautiful Soup?

Using Beautiful Soup for Python I'm trying to download data from this site , but the html code downloaded by Beautiful Soup contains just few lines and, in particular, it doesn't contain data displayed on the site.

I tried using different parsers too, such as lxml and html5lib but results were similar to the following:

>>> import requests
>>> from bs4 import BeautifulSoup
>>> url = 'http://opendatadpc.maps.arcgis.com/apps/opsdashboard/index.html#/b0c68bce2cce478eaac82fe38d4138b1'
>>> BeautifulSoup(requests.get(url).text, "html.parser")
<!DOCTYPE html>

<html>
<head>
<meta charset="utf-8"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<title>ArcGIS Dashboards</title>
<meta content="" name="description"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<link href="assets/images/favicon.ico?" rel="icon" type="image/x-icon"/>
<link href="https://js.arcgis.com/3.32/dijit/themes/claro/claro.css" rel="stylesheet" type="text/css"/>
<link href="https://js.arcgis.com/3.32/esri/css/esri.css" rel="stylesheet" type="text/css"/>
<link href="assets/vendor-ff6a5e0c0264e398e1ffaeb015926635.css" rel="stylesheet"/>
<link href="assets/app-light-7137f008b303d663c3645f07f162e89f.css" rel="stylesheet"/>
<script src="assets/amd-config-7e9801fc9c916a27bb75c6f356e09e0d.js"></script>
</head>
<body class="claro">
<script data-amd="true" src="https://js.arcgis.com/3.32/init.js"></script>
<script data-amd-loading="true" src="assets/amd-loading-d8029d0343fa400ebae9865c42984750.js"></script>
<div class="full-height flex-vertical flex-justify-center flex-align-items-center" id="initialLoadingContainer">
<div class="loader is-active">
<div class="loader-bars"></div>
</div>
</div>
</body>
</html>

Am I missing something?

it seems this page have dynamically-loaded content using JS frameworks. Have a look at this article: https://docs.scrapy.org/en/latest/topics/dynamic-content.html . You can inspect the page with the Web Dev Tools to try finding the real source, or alternatively try downloading it with Selenium, that it's a browser emulator in Python.

The page is javascript rendered. You need to use Selenium for it.

Code:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from time import sleep
url = 'http://opendatadpc.maps.arcgis.com/apps/opsdashboard/index.html#/b0c68bce2cce478eaac82fe38d4138b1'

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.set_window_size(1024, 600)
driver.maximize_window()
driver.get(url)
time.sleep(10) # <--- waits for 10 seconds so that page can gets rendered
# action = webdriver.ActionChains(driver)
print(driver.page_source) # <--- this will give you source code 

You can execute js script using:

driver.execute_script()

You can create wait timer like this:

WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, "myDynamicElement"))) #waits 10 seconds until element is located. Can have other wait conditions  such as visibility_of_element_located or text_to_be_present_in_element

<html dir="ltr" class="en-gb en dj_webkit dj_chrome dj_contentbox"><head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <title>COVID-19 ITALIA - Desktop</title>
  <meta name="description" content="">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <link rel="icon" href="assets/images/favicon.ico?" type="image/x-icon">
  <link href="https://js.arcgis.com/3.32/dijit/themes/claro/claro.css" rel="stylesheet" type="text/css">
  <link href="https://js.arcgis.com/3.32/esri/css/esri.css" rel="stylesheet" type="text/css">
  <link rel="stylesheet" href="assets/vendor-ff6a5e0c0264e398e1ffaeb015926635.css">
  <link rel="stylesheet" href="assets/app-dark-a8116e0262a64a5113c183f5acb0a03b.css">
  <script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/nls/jsapi_en-gb.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/dijit/ColorPicker.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/dijit/ColorPicker/HexPalette.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dijit/form/DateTextBox.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dijit/form/TimeTextBox.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dojox/color.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/dijit/Legend.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/dijit/Scalebar.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/dijit/BasemapGallery.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/dijit/LayerList.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/dijit/Search.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/tasks/locator.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/toolbars/draw.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/plugins/FeatureLayerStatistics.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/geometry/geometryEngineAsync.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/geometry/geometryEngine.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dojo/fx/easing.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/arcgis/Portal.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/styles/colors.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/moment/locale/en-gb.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dojox/gfx/svg.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dijit/Calendar.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dijit/form/_DateTimeTextBox.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/dijit/_Tooltip.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/dijit/ColorPicker/colorUtil.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/dijit/HorizontalSlider.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dijit/form/RadioButton.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dijit/_TimePicker.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dojox/color/_base.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/layers/VectorTileLayerImpl.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/tasks/AddressCandidate.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dijit/CalendarLite.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dijit/form/RangeBoundTextBox.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/toolbars/_toolbar.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/workers/WorkerClient.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/styles/basic.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/tasks/GenerateRendererTask.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/tasks/UniqueValueDefinition.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/tasks/ClassBreaksDefinition.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/tasks/GenerateRendererParameters.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/tasks/generateRenderer.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/tasks/ProjectParameters.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/workers/heatmapCalculator.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dojox/gfx/filters.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dojox/gfx/svgext.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dijit/form/HorizontalRuleLabels.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dijit/form/HorizontalSlider.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dijit/form/CheckBox.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dijit/form/_RadioButtonMixin.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dijit/form/_ListMouseMixin.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dojox/main.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dojo/colors.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/layers/nls/VectorTileLayerImpl_en-gb.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dijit/form/MappedTextBox.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/tasks/ClassificationDefinition.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dijit/form/HorizontalRule.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dojo/dnd/move.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dijit/form/_ListBase.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dijit/form/_CheckBoxMixin.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/dojo/selector/lite.js"></script><script type="text/javascript" charset="utf-8" src="assets/vendor-557b494b34c1b4f592d5f2948d530f35.js"></script><script type="text/javascript" charset="utf-8" src="assets/nickel-122f2be932fe8e42c7401c4190951f4c.js"></script><script type="text/javascript" charset="utf-8" src="assets/moment-timezone-with-data.min-f71eb5eba513b3ab182b567941a82ef5.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/layers/LabelLayer.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/tasks/support/pbfDeps.js"></script><script type="text/javascript" charset="utf-8" src="https://js.arcgis.com/3.32/esri/tasks/support/nls/pbfDeps_en-gb.js"></script><script src="assets/amd-config-7e9801fc9c916a27bb75c6f356e09e0d.js"></script>
<style>.cke{visibility:hidden;}</style></head>

<body class="claro ember-application">
  <script src="https://js.arcgis.com/3.32/init.js" data-amd="true"></script>
  <script src="assets/amd-loading-d8029d0343fa400ebae9865c42984750.js" data-amd-loading="true"></script>
  


<!---->
<div id="ember6" class="dashboard-page flex-vertical full panel panel-no-border panel-no-padding position-relative ember-view">  
<!---->
<!---->

  
<!---->
<div style="color:#ffffff;" id="ember8" class="flex-fluid flex-vertical overflow-hidden dashboard-container ember-view">
<div id="ember9" class="flex-fix panel-container flex-vertical top-panel-container ember-view"><div class="margin-container" style="">
<!---->
  <div class="full-container">
                  <div style="" id="ember10" class="header-panel flex-horizontal large ember-view">  <div class="flex-fix flex-align-center margin-left-1">
    <a target="_blank" class="logo-img-btn no-pointer-events">
      <img src="http://opendatadpc.maps.arcgis.com/sharing/rest/content/items/d97ea2b03e824d5ca261998c15204745/data">
    </a>
  </div>

<div class="flex-fix flex-align-center allow-shrink margin-left-1 flex-vertical">
  <div class="title no-pointer-events text-ellipsis">Dipartimento della Protezione Civile</div>
  <div class="subtitle text-ellipsis no-pointer-events">Aggiornamento casi COVID-19</div>
</div>

<div class="selectors-container flex-fluid flex-align-center flex-horizontal flex-justify-end">
<!----></div>

<div id="ember11" class="margin-left-1 flex-fix flex-align-center menu-links dropdown ember-view"><button aria-expanded="false" aria-haspopup="true" tabindex="0" id="ember12" class="btn btn-large dropdown-btn ember-view">        <span id="ember13" class="icon-element ember-view"><svg xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" height="24px" width="24px" viewBox="0 0 24 24" id="ember14" class="ember-view"><path d="M21 6H3V4h18zm0 5H3v2h18zm0 7H3v2h18z"></path></svg></span>

</button>
<nav role="menu" id="ember15" class="dropdown-right dropdown-menu ember-view">
<!---->
        <a target="_blank" href="http://www.governo.it/" role="menu-item" id="ember17" class="dropdown-link dropdown-menu-item ember-view">  <div class="flex-horizontal flex-align-items-center">
<!---->    <div class="flex-fluid text-ellipsis ">Presidenza del Consiglio dei Ministri</div> 
<!---->  </div>

</a>
        <a target="_blank" href="http://www.protezionecivile.it" role="menu-item" id="ember19" class="dropdown-link dropdown-menu-item ember-view">  <div class="flex-horizontal flex-align-items-center">
<!---->    <div class="flex-fluid text-ellipsis ">Dipartimento della Protezione Civile</div> 
<!---->  </div>

</a>
        <a target="_blank" href="http://www.salute.gov.it" role="menu-item" id="ember21" class="dropdown-link dropdown-menu-item ember-view">  <div class="flex-horizontal flex-align-items-center">
<!---->    <div class="flex-fluid text-ellipsis ">Ministero della Salute</div> 
<!---->  </div>

</a>
        <a target="_blank" href="http://arcg.is/081a51" role="menu-item" id="ember23" class="dropdown-link dropdown-menu-item ember-view">  <div class="flex-horizontal flex-align-items-center">
<!---->    <div class="flex-fluid text-ellipsis ">Versione Mobile</div> 
<!---->  </div>

</a>
        <a target="_blank" href="https://github.com/pcm-dpc/COVID-19" role="menu-item" id="ember25" class="dropdown-link dropdown-menu-item ember-view">  <div class="flex-horizontal flex-align-items-center">
<!---->    <div class="flex-fluid text-ellipsis ">Repository dei dati</div> 
<!---->  </div>

</a>

<!---->
</nav>
</div></div>

  </div>

<!---->
<!----></div>
</div>
      <div class="flex-fluid flex-horizontal position-relative overflow-hidden">

          <div id="ember26" class="flex-fluid panel-container flex-vertical left-panel-container slide-over ember-view"><div class="margin-container" style="">
<!---->
  <div class="full-container">
      <div id="ember27" class="full-height left-panel flex-vertical ember-view">  <div class="caption margin-right-1 flex-fix">
    <table border="0" cellpadding="1" cellspacing="1" style="width:100%">
    <tbody>
        <tr>
            <td style="text-align:center"><img alt="" src="http://opendatadpc.maps.arcgis.com/sharing/rest/content/items/b5176eff01df4ff798be038b1dabb09a/data" style="width:200px"></td>
        </tr>
    </tbody>
</table>

<p style="text-align:center"><span style="font-size:14px"><strong>Informazioni</strong></span></p>

<p style="text-align:center">&nbsp;</p>

  </div>
 
<div class="selectors-container flex-fluid flex-vertical overflow-y-auto">
<!----></div>

  <div class="flex-fix description">
    <p><span style="color:#ffffff"><span style="font-size:14px">Il 31 gennaio 2020, il Consiglio dei Ministri dichiara lo stato di emergenza, per la durata di sei mesi, in conseguenza del rischio sanitario connesso all'infezione da Coronavirus.</span></span></p>

<p><span style="color:#ffffff"><span style="font-size:14px">Al Capo del Dipartimento della Protezione Civile, Angelo Borrelli, è affidato il coordinamento degli interventi necessari a fronteggiare l'emergenza sul territorio nazionale.</span></span></p>
.
.
.
.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM