简体   繁体   中英

How can i scrape a code with python when the top level is repeating?

I would like to web scrape the following information with python:

I want the text (1, company_name, 3000) and (2, company_name, 5000) of the following code.

So the code has to go into the first level <tr role="row">...</tr , take this information, then go into the second, and so on.

```<tr role="row">...</tr
    <td class="abc</td
     <td class="text-xs"... *1* </td
    <td class="comp-name"
     <div class = "tw-flex"
      <a class="justify-between" *company_name* </a
     </div
    <td class="price"
     <span class = "1.0" *3000* </span```

```<tr role="row">...</tr
    <td class="abc</td
     <td class="text-xs"... *2* </td
    <td class="comp-name"
     <div class = "tw-flex"
      <a class="justify-between" *company_name* </a
     </div
    <td class="price"
     <span class = "1.0" *5000* </span```

I tried the following (code only for the company_name)

```for tr in soup.find_all('tr'):
    if tr.has_attr('role'):
        name = soup.find('a', attrs={"class": "justify-between"}).text
        name_list.append(name)```

But with this code I only get the first company_name every time he is iterating it:

name_list = ['Adidas', 'Adidas', 'Adidas']

I found your problem.

Your Code:

1 for tr in soup.find_all('tr'):
2     if tr.has_attr('role'):
3         name = soup.find('a', attrs={"class": "justify-between"}).text
4         name_list.append(name)

Change Code on line 3:

name = tr.find('a', attrs={"class": "justify-between"}).text

Then you should get each new companyname.

You can use this example how to extract the information from the HTML source:

from bs4 import BeautifulSoup

html_doc = """
    <tr role="row">...</tr>
        <td class="abc"></td>
         <td class="text-xs"> *1* </td>
        <td class="comp-name">
         <div class = "tw-flex">
          <a class="justify-between"> *company_name* </a>
         </div>
        </td>
        <td class="price">
         <span class = "1.0"> *3000* </span>
        </td>

   <tr role="row">...</tr>
       <td class="abc"></td>
        <td class="text-xs"> *2* </td>
       <td class="comp-name">
        <div class = "tw-flex">
         <a class="justify-between"> *company_name* </a>
        </div>
       </td>
       <td class="price">
        <span class = "1.0"> *5000* </span>
       </td>
"""

soup = BeautifulSoup(html_doc, "html.parser")

for td in soup.select(".text-xs"):
    name = td.find_next(class_="justify-between")
    price = td.find_next(class_="price")
    print(
        td.get_text(strip=True),
        name.get_text(strip=True),
        price.get_text(strip=True),
    )
    print("-" * 80)

Prints:

*1* *company_name* *3000*
--------------------------------------------------------------------------------
*2* *company_name* *5000*
--------------------------------------------------------------------------------

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM