简体   繁体   中英

BeautifulSoup html table scrape - will only return last row

I am attempting a simple scrape of an HTML table using BeautifulSoup with the following:

import urllib
import urllib.request
from bs4 import BeautifulSoup

def make_soup(url):
    page = urllib.request.urlopen(url)
    sdata = BeautifulSoup(page, 'html.parser')
    return sdata

url = 'http://www.satp.org/satporgtp/countries/pakistan/database/bombblast.htm'
soup = make_soup(url)

table = soup.findAll('table', attrs={'class':'pagraph1'})
table = table[0]

trows = table.findAll('tr')
bbdata_ = []
bbdata = []
for trow in trows:
    bbdata_ = trow.findAll('td')
    bbdata = [ele.text.strip() for ele in bbdata_]

print(bbdata)

However, I can only extract the last row in the table, ie

['Total*', '369', '1032+']

All of the data is included in the trows , so I must be forming my loop incorrectly, but I am not sure how.

Your problem is here:

bbdata = [ele.text.strip() for ele in bbdata_]

You want to append to the list or extend it:

bbdata.append([ele.text.strip() for ele in bbdata_])

You are overwriting bbdata each time through the loop which is why it ends up only with the final value.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM