简体   繁体   中英

How to bring back 1st div child in python using bs4 soup.select within a dynamic table

In the below html elements, I have been unsuccessful using beautiful soup.select to only obtain the first child after div class="wrap-25PNPwRV"> (ie -11.94M and 2.30M) in list format

<div class="value-25PNPwRV">
   <div class="wrap-25PNPwRV">
      <div>‪−11.94M‬</div>
      <div class="change-25PNPwRV negative-25PNPwRV">−119.94%</div></div></div>

<div class="value-25PNPwRV additional-25PNPwRV">
   <div class="wrap-25PNPwRV">
      <div>‪2.30M‬</div>
      <div class="change-25PNPwRV negative-25PNPwRV">−80.17%</div></div></div>

Above is just two examples within the html I'm attempting to scrape within the dynamic javascript coded table which the above source code lies within, but there are many more div attributes on the page, and many more div class "wrap-25PNPwRV" inside the javascript table

I currently have the below code which allows me to scrape all the contents within div class ="wrap-25PNPwRV"

data_list = [elem.get_text() for elem in soup.select("div.wrap-25PNPwRV")]

Output:

['-11.94M', '-119.94%', '2.30M', '-80.17%']

However, I would like to use soup.select to yield the desired output:

['-11.94M', '2.30M']

I tried following this guide https://www.crummy.com/software/BeautifulSoup/bs4/doc/ but have been unsuccessful to implement it to my above code.

Please note, if soup.select is not possible to perform the above, I am happy to use an alternative providing it generates the same list format/output

You can use the :nth-of-type CSS selector:

data_list = [elem.get_text() for elem in soup.select(".wrap-25PNPwRV div:nth-of-type(1)")]

I'd suggest to not use the .wrap-25PNPwRV class. Seems random and almost certainly will change in the future.

Instead, select the <div> element which has other element with class="change..." as sibling. For example

print([t.text.strip() for t in soup.select('div:has(+ [class^="change"])')])

Prints:

['−11.94M', '2.30M']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM