I would like to web scrape the following information with python:
I want the text (1, company_name, 3000)
and (2, company_name, 5000)
of the following code.
So the code has to go into the first level <tr role="row">...</tr
, take this information, then go into the second, and so on.
```<tr role="row">...</tr
<td class="abc</td
<td class="text-xs"... *1* </td
<td class="comp-name"
<div class = "tw-flex"
<a class="justify-between" *company_name* </a
</div
<td class="price"
<span class = "1.0" *3000* </span```
```<tr role="row">...</tr
<td class="abc</td
<td class="text-xs"... *2* </td
<td class="comp-name"
<div class = "tw-flex"
<a class="justify-between" *company_name* </a
</div
<td class="price"
<span class = "1.0" *5000* </span```
I tried the following (code only for the company_name)
```for tr in soup.find_all('tr'):
if tr.has_attr('role'):
name = soup.find('a', attrs={"class": "justify-between"}).text
name_list.append(name)```
But with this code I only get the first company_name every time he is iterating it:
name_list = ['Adidas', 'Adidas', 'Adidas']
I found your problem.
Your Code:
1 for tr in soup.find_all('tr'):
2 if tr.has_attr('role'):
3 name = soup.find('a', attrs={"class": "justify-between"}).text
4 name_list.append(name)
Change Code on line 3:
name = tr.find('a', attrs={"class": "justify-between"}).text
Then you should get each new companyname.
You can use this example how to extract the information from the HTML source:
from bs4 import BeautifulSoup
html_doc = """
<tr role="row">...</tr>
<td class="abc"></td>
<td class="text-xs"> *1* </td>
<td class="comp-name">
<div class = "tw-flex">
<a class="justify-between"> *company_name* </a>
</div>
</td>
<td class="price">
<span class = "1.0"> *3000* </span>
</td>
<tr role="row">...</tr>
<td class="abc"></td>
<td class="text-xs"> *2* </td>
<td class="comp-name">
<div class = "tw-flex">
<a class="justify-between"> *company_name* </a>
</div>
</td>
<td class="price">
<span class = "1.0"> *5000* </span>
</td>
"""
soup = BeautifulSoup(html_doc, "html.parser")
for td in soup.select(".text-xs"):
name = td.find_next(class_="justify-between")
price = td.find_next(class_="price")
print(
td.get_text(strip=True),
name.get_text(strip=True),
price.get_text(strip=True),
)
print("-" * 80)
Prints:
*1* *company_name* *3000*
--------------------------------------------------------------------------------
*2* *company_name* *5000*
--------------------------------------------------------------------------------
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.