I tried scraping tables according to the question: Python BeautifulSoup scrape tables
From the top solution, there I tried:
HTML code:
<div class="table-frame small">
<table id="rfq-display-line-items-list" class="table">
<thead id="rfq-display-line-items-header">
<tr>
<th>Mfr. Part/Item #</th>
<th>Manufacturer</th>
<th>Product/Service Name</th>
<th>Qty.</th>
<th>Unit</th>
<th>Ship Address</th>
</tr>
</thead>
<tbody id="rfq-display-line-item-0">
<tr>
<td><span class="small">43933</span></td>
<td><span class="small">Anvil International</span></td>
<td><span class="small">Cap Steel Black 1-1/2"</span></td>
<td><span class="small">800</span></td>
<td><span class="small">EA</span></td>
<td><span class="small">1</span></td>
</tr>
<!----><!---->
</tbody><tbody id="rfq-display-line-item-1">
<tr>
<td><span class="small">330035205</span></td>
<td><span class="small">Anvil International</span></td>
<td><span class="small">1-1/2" x 8" Black Steel Nipple</span></td>
<td><span class="small">400</span></td>
<td><span class="small">EA</span></td>
<td><span class="small">1</span></td>
</tr>
<!----><!---->
</tbody><!---->
</table><!---->
</div>
According to solution ,
What I tried is:
for tr in soup.find_all('table', {'id': 'rfq-display-line-items-list'}):
tds = tr.find_all('td')
print(tds[0].text, tds[1].text, tds[2].text, tds[3].text, tds[4].text, tds[5].text)
But this displayed only the first row,
43933 Anvil International Cap Steel Black 1-1/2" 800 EA 1
I later out found out the all those <td>
were stored in the list. I want to print all the rows.
Expected Output:
43933 Anvil International Cap Steel Black 1-1/2" 800 EA 1
330035205 Anvil International 1-1/2" x 8" Black Steel Nipple 400 EA 1
You start with tr
tag & go down to td
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
for tr in soup.find("table", id="rfq-display-line-items-list").find_all("tr"):
print(" ".join([td.text for td in tr.find_all('td')]))
43933 Anvil International Cap Steel Black 1-1/2" 800 EA 1
330035205 Anvil International 1-1/2" x 8" Black Steel Nipple 400 EA 1
You can do that using css selectors as follows:
for tr in soup.select('table#rfq-display-line-items-list tbody tr'):
tds = tr.find_all('td')
print(tds[0].text, tds[1].text, tds[2].text, tds[3].text, tds[4].text, tds[5].text)
output:
43933 Anvil International Cap Steel Black 1-1/2" 800 EA 1
330035205 Anvil International 1-1/2" x 8" Black Steel Nipple 400 EA 1
While you are selecting your table with find_all()
you would get a resultset with only one element (the table) and that is the reason, why your loop only iterate ones and print first row only.
Select your target more specific - As alternativ approach you also could use css selctors
and stripped_strings
to achieve your task.
This will select all <tr>
from the <tbody>
of element(table) with id="rfq-display-line-items-list"
:
soup.select('#rfq-display-line-items-list tbody tr')
stripped_strings
as generator get the strings of all the elements (the <td>
s) in row
and you can join()
it to a string:
" ".join(list(row.stripped_strings))
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
for row in soup.select('#rfq-display-line-items-list tbody tr'):
print(" ".join(list(row.stripped_strings)))
43933 Anvil International Cap Steel Black 1-1/2" 800 EA 1
330035205 Anvil International 1-1/2" x 8" Black Steel Nipple 400 EA 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.