[英]How to extract the data from encoded HTML class using python
您需要使用requests
發出請求(它會自動解碼頁面,在大多數情況下),並使用 beautifulsoup 從beautifulsoup
中提取數據。
在線 IDE 中的代碼和示例(僅顯示提取部分,因為您尚未提供網站 URL):
from bs4 import BeautifulSoup
html = """
<div class="L581yb VICjCf" hjdwnd-ahquyc-r6poud="" jndksc="" l6ctce-pszop"="" l6ctce-purzt="" tabindex=" == $0
<div class=">
</div>
<div class="hJDwNd-AhqUyc-WNfPc purZT-AhqUyC-I15mzb PSzOP-AhqUyc-qWD73c JNdks <div class=" jndksc-smkayb"="">
<div class="" f570id"="" jsaction="zXBUYD: ZTPCnb; 2QF9Uc: Qxe3nd;
jsname=" jscontroller="SGWD4d">
>
<div class="oKdM2C KzvoMe">
<div class="hJDwNd-AhqUyc-WNFPC PSzOP-AhqUyc- qWD73c jXK9ad D2fZ2 Oj CsFc whaque GNzUNC" id="h.7f5e93de0cf8a767_49">
<div class="]XK9ad-SmkAyb">
<div class="ty]Ctd mGzaTb baZpAe">
<div class="GV3q8e aP9Z7e" id="h.p_9livxd801krd">
</div>
<h3 class="CDt4ke zfr3Q OmQG5e" dir="ltr" id="h.p_9livxd801krd" tabindex="-1">
.
</h3>
<div class="GV3q8e aP9z7e" id="h.p JrEgQYpyORCF">
</div>
<h3 class="CDt 4Ke zfr3Q OmQG5e" dir="ltr" id="h.p_JrEgQYPYORCF" tabindex="-1">
<div class="CjVfdc" jsaction="touchstart:UrsOsc; click:Kjs
qPd; focusout:QZoaz; mouseover:yOpDld; mouseout:dq0hvd;fvlRjc:jbFSO
d;CrflRd:SzACGe;" jscontroller="Ae65rd">
<div class="PPHIP rviiZ" jsname="haAclf">
.
</div>
<span style="font-family: 'Oswald'; font-weight: 500;">
Telephone : 01564 773348
</span>
</div>
</h3>
<div class="GV3q8e aP9z7e" id="h.p_sylefz-BOSBX">
</div>
><h3 id="h.p_sylefz-BOSBX" dir="ltr" class="CDt 4Ke zfr3Q OmQG5e"
</div>
</div>
</div>
</div>
</div>
</div>
"""
# pass HTML to BeautifulSoup object and assign a html.parser as a HTML parser
soup = BeautifulSoup(html, "html.parser")
# grab a phone number (only first occurrence will be extracted)
# https://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors
print(soup.select_one('.CjVfdc span').text.strip())
# Telephone : 01564 773348
# extract <div> element with .L581yb class. returns a list()
print(soup.select('.L581yb'))
'''
[<div class="L581yb VICjCf" hjdwnd-ahquyc-r6poud="" jndksc="" l6ctce-pszop"="" l6ctce-purzt="" tabindex=" == $0
<div class=">
</div>]
'''
# extract <div> element with .hJDwNd-AhqUyc-WNfPc class. returns a list()
print(soup.select('.hJDwNd-AhqUyc-WNfPc'))
'''
[<div class="hJDwNd-AhqUyc-WNfPc purZT-AhqUyC-I15mzb PSzOP-AhqUyc-qWD73c JNdks <div class=" jndksc-smkayb"="">
<div class="" f570id"="" jsaction="zXBUYD: ZTPCnb; 2QF9Uc: Qxe3nd;
jsname=" jscontroller="SGWD4d">
>
<div class="oKdM2C KzvoMe">
<div class="hJDwNd-AhqUyc-WNFPC PSzOP-AhqUyc- qWD73c jXK9ad D2fZ2 Oj CsFc whaque GNzUNC" id="h.7f5e93de0cf8a767_49">
<div class="]XK9ad-SmkAyb">
<div class="ty]Ctd mGzaTb baZpAe">
<div class="GV3q8e aP9Z7e" id="h.p_9livxd801krd">
</div>
<h3 class="CDt4ke zfr3Q OmQG5e" dir="ltr" id="h.p_9livxd801krd" tabindex="-1">
.
</h3>
<div class="GV3q8e aP9z7e" id="h.p JrEgQYpyORCF">
</div>
<h3 class="CDt 4Ke zfr3Q OmQG5e" dir="ltr" id="h.p_JrEgQYPYORCF" tabindex="-1">
<div class="CjVfdc" jsaction="touchstart:UrsOsc; click:Kjs
qPd; focusout:QZoaz; mouseover:yOpDld; mouseout:dq0hvd;fvlRjc:jbFSO
d;CrflRd:SzACGe;" jscontroller="Ae65rd">
<div class="PPHIP rviiZ" jsname="haAclf">
.
</div>
<span style="font-family: 'Oswald'; font-weight: 500;">
Telephone : 01564 773348
</span>
</div>
</h3>
<div class="GV3q8e aP9z7e" id="h.p_sylefz-BOSBX">
</div>
><h3 id="h.p_sylefz-BOSBX" dir="ltr" class="CDt 4Ke zfr3Q OmQG5e"
</div>
</div>
</div>
</div>
</div>
</div>]
'''
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.