简体   繁体   中英

How to get all links from page in DOM?

I use beautifulsoup in Python that to get all links:

links = soup.select('.cover > .card-click-target')
        print(links);

But it gives me an array with one element and string value.

My HTML code is:

<div class="cover">
  <div class="cover-image-container"> 
    <div class="cover-outer-align"> 
      <div class="cover-inner-align"> 
        <img alt="Kate Mobile Lite" class="cover-image" data-cover-large="" data-cover-small="" src="" aria-hidden="true"> 
      </div>
    </div>
  </div> 
  <a class="card-click-target" href="/s/kate_new_6" aria-label=" Kate Mobile Lite     ">
    <span class="movies preordered-overlay-container id-preordered-overlay-container" style="display:none"> 
      <span class="preordered-label">Предзаказ</span>
    </span> 
    <span class="preview-overlay-container">  </span>  
  </a> 
</div>

<div class="cover"> 
  <div class="cover-image-container">
    <div class="cover-outer-align">
      <div class="cover-inner-align"> 
        <img alt="Kate Mobile Lite" class="cover-image" data-cover-large="" data-cover-small="" src="" aria-hidden="true">
      </div>
    </div> 
  </div> 
  <a class="card-click-target" href="/s/kate_new_6" aria-label=" Kate Mobile Lite     ">
    <span class="movies preordered-overlay-container id-preordered-overlay-container" style="display:none">
      <span class="preordered-label">Предзаказ</span>
    </span>
    <span class="preview-overlay-container"> 
    </span>  
  </a>  
</div>

I wouldn't fully trust CSS selector in BeautifulSoup, just a quick search you'll find this answer here talked about updating BeautifulSoup fixed the problem he had.

I would highly recommend you write a function to do the job

links = soup.find_all(lambda tag: tag.parent.get('class', None) == ['cover'] \
                       and tag.get('class', None) == ['card-click-target'])

The anonymous lambda function will search for all tags with class of card-click-target and also make sure those tags have a parent with class of cover .

link_tags = soup.find_all('a', class_="card-click-target")
links = [i.get('href') for i in link_tags]

out:

['/s/kate_new_6', '/s/kate_new_6']

select version :

link_tags = soup.select('.cover .card-click-target')
links =[i.get('href') for i in link_tags]

Check this example:

>>> s = """    <div class="cover">
       <div class="cover-image-container">
         <div class="cover-outer-align">
           <div class="cover-inner-align">
             <img alt="Kate Mobile Lite" class="cover-image" data-cover-large="" data-cover-small="" src="" aria-hidden="true">
           </div>
         </div>
       </div>
       <a class="card-click-target" href="/s/kate_new_6" aria-label=" Kate Mobile Lite     ">
         <span class="movies preordered-overlay-container id-preordered-overlay-container" style="display:none">
           <span class="preordered-label">Предзаказ</span>
         </span>
         <span class="preview-overlay-container">  </span>
       </a>
     </div>

     <div class="cover">
       <div class="cover-image-container">
         <div class="cover-outer-align">
           <div class="cover-inner-align">
             <img alt="Kate Mobile Lite" class="cover-image" data-cover-large="" data-cover-small="" src="" aria-hidden="true">
           </div>
         </div>
       </div>
       <a class="card-click-target" href="/s/kate_new_6" aria-label=" Kate Mobile Lite     ">
         <span class="movies preordered-overlay-container id-preordered-overlay-container" style="display:none">
           <span class="preordered-label">Предзаказ</span>
         </span>
         <span class="preview-overlay-container">
         </span>
       </a>
     </div>"""
>>> sp = BeautifulSoup(s)
>>> sp.select(".cover > a.card-click-target")
[<a aria-label=" Kate Mobile Lite     " class="card-click-target" href="/s/kate_new_6">
 <span class="movies preordered-overlay-container id-preordered-overlay-container" style="display:none">
 <span class="preordered-label">?????????</span>
 </span>
 <span class="preview-overlay-container"> </span>
 </a>,
 <a aria-label=" Kate Mobile Lite     " class="card-click-target" href="/s/kate_new_6">
 <span class="movies preordered-overlay-container id-preordered-overlay-container" style="display:none">
 <span class="preordered-label">?????????</span>
 </span>
 <span class="preview-overlay-container">
 </span>
 </a>]

>>> len(sp.select(".cover > a.card-click-target"))
2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM