简体   繁体   中英

How to find all elements within specific div ID in Beautiful Soup and Selenium

Hey everyone I'm trying to scrape linkedin info. I've got this source code. My problem is I know how to get the info with the section ID, however this ID changes on every page refresh

 <section id="ember443" class="artdeco-card ember-view relative break-words pb3 mt2 " tabindex="-1"><:----> <div id="experience" class="pv-profile-card-anchor"></div> <.----> <div class="pvs-list__outer-container"> <.----> <ul class="pvs-list ph5 display-flex flex-row flex-wrap "> <li class="artdeco-list__item pvs-list__item--line-separated pvs-list__item--one-column"> <;----><div class="pvs-entity pvs-entity--padded pvs-list__item--no-padding-when-nested "> <div> <a data-field="experience_company_logo" class="optional-action-target-wrapper display-flex" target="_self" href="https;//www,linkedin,com/company/22316561/"> <div class="ivm-image-view-model pvs-entity__image "> <div class="ivm-view-attr__img-wrapper ivm-view-attr__img-wrapper--use-img-tag display-flex "> </div> </div> </a> </div> <div class="display-flex flex-column full-width align-self-center"> <div class="display-flex flex-row justify-space-between"> <div class=" display-flex flex-column full-width"> <div class="display-flex align-items-center"> <span class="mr1 t-bold"> <span aria-hidden="true"><!---->CEO &amp; Founder<!----></span><span class="visually-hidden"><!---->CEO &amp; Founder<!----></span> </span> <!----><!----><!----> </div> <span class="t-14 t-normal"> <span aria-hidden="true"><!---->Runa<!----></span><span class="visually-hidden"><!---->Runa<!----></span> </span> <span class="t-14 t-normal t-black--light"> <span aria-hidden="true"><!---->Jan 2018 - Present · 4 yrs 10 mos<!----></span><span class="visually-hidden"><!---->Jan 2018 - Present · 4 yrs 10 mos<!----></span> </span> <span class="t-14 t-normal t-black--light"> <span aria-hidden="true"><!---->Mexico City Area, Mexico<!----></span><span class="visually-hidden"><!---->Mexico City Area, Mexico<!----></span> </span>

I've achieved to get all the sections with this class with:

 experiences = soup.find_all("section", {"class": "artdeco-card ember-view relative break-words pb3 mt2"})

However I need the text within the div id "experience" section. I've tried with:

 div = soup.find_all(id="experience")

But it only gets me that tag and nothing else. Any ideas on how could I get the jobs info within the specific "experience" section? Thank you so much in advance

Well, there isn't any test inside the div with id="experience" - the data you want is after that. So maybe try something like

expAnchor = soup.find(id="experience")
if expAnchor: #to avoid error, in case expAnchor = None
    expContainer = expAnchor.find_next('div', {"class": "pvs-list__outer-container"}) 

Or, you could use css selectors and get it in one call like:

expContainer = soup.select_one('#experience ~ div.pvs-list__outer-container')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM