简体   繁体   English

如何在 Beautiful Soup 和 Selenium 中查找特定 div ID 中的所有元素

[英]How to find all elements within specific div ID in Beautiful Soup and Selenium

Hey everyone I'm trying to scrape linkedin info.大家好,我正在尝试抓取 linkedin 信息。 I've got this source code.我有这个源代码。 My problem is I know how to get the info with the section ID, however this ID changes on every page refresh我的问题是我知道如何使用部分 ID 获取信息,但是此 ID 在每次页面刷新时都会更改

 <section id="ember443" class="artdeco-card ember-view relative break-words pb3 mt2 " tabindex="-1"><:----> <div id="experience" class="pv-profile-card-anchor"></div> <.----> <div class="pvs-list__outer-container"> <.----> <ul class="pvs-list ph5 display-flex flex-row flex-wrap "> <li class="artdeco-list__item pvs-list__item--line-separated pvs-list__item--one-column"> <;----><div class="pvs-entity pvs-entity--padded pvs-list__item--no-padding-when-nested "> <div> <a data-field="experience_company_logo" class="optional-action-target-wrapper display-flex" target="_self" href="https;//www,linkedin,com/company/22316561/"> <div class="ivm-image-view-model pvs-entity__image "> <div class="ivm-view-attr__img-wrapper ivm-view-attr__img-wrapper--use-img-tag display-flex "> </div> </div> </a> </div> <div class="display-flex flex-column full-width align-self-center"> <div class="display-flex flex-row justify-space-between"> <div class=" display-flex flex-column full-width"> <div class="display-flex align-items-center"> <span class="mr1 t-bold"> <span aria-hidden="true"><!---->CEO &amp; Founder<!----></span><span class="visually-hidden"><!---->CEO &amp; Founder<!----></span> </span> <!----><!----><!----> </div> <span class="t-14 t-normal"> <span aria-hidden="true"><!---->Runa<!----></span><span class="visually-hidden"><!---->Runa<!----></span> </span> <span class="t-14 t-normal t-black--light"> <span aria-hidden="true"><!---->Jan 2018 - Present · 4 yrs 10 mos<!----></span><span class="visually-hidden"><!---->Jan 2018 - Present · 4 yrs 10 mos<!----></span> </span> <span class="t-14 t-normal t-black--light"> <span aria-hidden="true"><!---->Mexico City Area, Mexico<!----></span><span class="visually-hidden"><!---->Mexico City Area, Mexico<!----></span> </span>

I've achieved to get all the sections with this class with:我已经通过以下方式获得了这个 class 的所有部分:

 experiences = soup.find_all("section", {"class": "artdeco-card ember-view relative break-words pb3 mt2"})

However I need the text within the div id "experience" section.但是我需要 div id“体验”部分中的文本。 I've tried with:我试过:

 div = soup.find_all(id="experience")

But it only gets me that tag and nothing else.但它只会让我得到那个标签,没有别的。 Any ideas on how could I get the jobs info within the specific "experience" section?关于如何在特定“体验”部分中获取工作信息的任何想法? Thank you so much in advance非常感谢你提前

Well, there isn't any test inside the div with id="experience" - the data you want is after that.好吧,在 id="experience" 的 div中没有任何测试 - 你想要的数据在那之后 So maybe try something like所以也许尝试像

expAnchor = soup.find(id="experience")
if expAnchor: #to avoid error, in case expAnchor = None
    expContainer = expAnchor.find_next('div', {"class": "pvs-list__outer-container"}) 

Or, you could use css selectors and get it in one call like:或者,您可以使用css 选择器并在一次调用中获取它,例如:

expContainer = soup.select_one('#experience ~ div.pvs-list__outer-container')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM