簡體   English   中英

如何使用 python 從編碼的 HTML class 中提取數據

[英]How to extract the data from encoded HTML class using python

如何使用 Python 檢索網頁(標題 html 標簽)的頁面編碼 div class?

這是我的示例 html 代碼。 在此處輸入圖像描述

您需要使用requests發出請求(它會自動解碼頁面,在大多數情況下),並使用 beautifulsoup 從beautifulsoup中提取數據。

在線 IDE 中的代碼和示例(僅顯示提取部分,因為您尚未提供網站 URL):

from bs4 import BeautifulSoup

html = """
<div class="L581yb VICjCf" hjdwnd-ahquyc-r6poud="" jndksc="" l6ctce-pszop"="" l6ctce-purzt="" tabindex=" == $0
&lt;div class=">
</div>
<div class="hJDwNd-AhqUyc-WNfPc purZT-AhqUyC-I15mzb PSzOP-AhqUyc-qWD73c JNdks &lt;div class=" jndksc-smkayb"="">
 <div class="" f570id"="" jsaction="zXBUYD: ZTPCnb; 2QF9Uc: Qxe3nd;
jsname=" jscontroller="SGWD4d">
  &gt;
  <div class="oKdM2C KzvoMe">
   <div class="hJDwNd-AhqUyc-WNFPC PSzOP-AhqUyc- qWD73c jXK9ad D2fZ2 Oj CsFc whaque GNzUNC" id="h.7f5e93de0cf8a767_49">
    <div class="]XK9ad-SmkAyb">
     <div class="ty]Ctd mGzaTb baZpAe">
      <div class="GV3q8e aP9Z7e" id="h.p_9livxd801krd">
      </div>
      <h3 class="CDt4ke zfr3Q OmQG5e" dir="ltr" id="h.p_9livxd801krd" tabindex="-1">
       .
      </h3>
      <div class="GV3q8e aP9z7e" id="h.p JrEgQYpyORCF">
      </div>
      <h3 class="CDt 4Ke zfr3Q OmQG5e" dir="ltr" id="h.p_JrEgQYPYORCF" tabindex="-1">
       <div class="CjVfdc" jsaction="touchstart:UrsOsc; click:Kjs
qPd; focusout:QZoaz; mouseover:yOpDld; mouseout:dq0hvd;fvlRjc:jbFSO
d;CrflRd:SzACGe;" jscontroller="Ae65rd">
        <div class="PPHIP rviiZ" jsname="haAclf">
         .
        </div>
        <span style="font-family: 'Oswald'; font-weight: 500;">
         Telephone : 01564 773348
        </span>
       </div>
      </h3>
      <div class="GV3q8e aP9z7e" id="h.p_sylefz-BOSBX">
      </div>
      &gt;&lt;h3 id="h.p_sylefz-BOSBX" dir="ltr" class="CDt 4Ke zfr3Q OmQG5e"
     </div>
    </div>
   </div>
  </div>
 </div>
</div>
"""

# pass HTML to BeautifulSoup object and assign a html.parser as a HTML parser
soup = BeautifulSoup(html, "html.parser")

# grab a phone number (only first occurrence will be extracted)
# https://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors
print(soup.select_one('.CjVfdc span').text.strip())

# Telephone : 01564 773348


# extract <div> element with .L581yb class. returns a list()
print(soup.select('.L581yb'))

'''
[<div class="L581yb VICjCf" hjdwnd-ahquyc-r6poud="" jndksc="" l6ctce-pszop"="" l6ctce-purzt="" tabindex=" == $0
&lt;div class=">
</div>]
'''


# extract <div> element with .hJDwNd-AhqUyc-WNfPc class. returns a list()
print(soup.select('.hJDwNd-AhqUyc-WNfPc'))

'''
[<div class="hJDwNd-AhqUyc-WNfPc purZT-AhqUyC-I15mzb PSzOP-AhqUyc-qWD73c JNdks &lt;div class=" jndksc-smkayb"="">
<div class="" f570id"="" jsaction="zXBUYD: ZTPCnb; 2QF9Uc: Qxe3nd;
jsname=" jscontroller="SGWD4d">
  &gt;
  <div class="oKdM2C KzvoMe">
<div class="hJDwNd-AhqUyc-WNFPC PSzOP-AhqUyc- qWD73c jXK9ad D2fZ2 Oj CsFc whaque GNzUNC" id="h.7f5e93de0cf8a767_49">
<div class="]XK9ad-SmkAyb">
<div class="ty]Ctd mGzaTb baZpAe">
<div class="GV3q8e aP9Z7e" id="h.p_9livxd801krd">
</div>
<h3 class="CDt4ke zfr3Q OmQG5e" dir="ltr" id="h.p_9livxd801krd" tabindex="-1">
       .
      </h3>
<div class="GV3q8e aP9z7e" id="h.p JrEgQYpyORCF">
</div>
<h3 class="CDt 4Ke zfr3Q OmQG5e" dir="ltr" id="h.p_JrEgQYPYORCF" tabindex="-1">
<div class="CjVfdc" jsaction="touchstart:UrsOsc; click:Kjs
qPd; focusout:QZoaz; mouseover:yOpDld; mouseout:dq0hvd;fvlRjc:jbFSO
d;CrflRd:SzACGe;" jscontroller="Ae65rd">
<div class="PPHIP rviiZ" jsname="haAclf">
         .
        </div>
<span style="font-family: 'Oswald'; font-weight: 500;">
         Telephone : 01564 773348
        </span>
</div>
</h3>
<div class="GV3q8e aP9z7e" id="h.p_sylefz-BOSBX">
</div>
      &gt;&lt;h3 id="h.p_sylefz-BOSBX" dir="ltr" class="CDt 4Ke zfr3Q OmQG5e"
     </div>
</div>
</div>
</div>
</div>
</div>]
'''

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM