简体   繁体   中英

How do I scrape a table from a website with Python that doesn't have an ID tag or class?

enter image description here Code with table that I want to scrape

It lists the "Location" but I want to find the "33 Montrose Ave." for a set of tables like this one. I was using BeautifulSoup and Requests to pull the .url and parse it to HTML. If i could find the "location" text and use something like nextSibling etc. that would be great. Thank you!

import requests
from bs4 import BeautifulSoup


website = 
requests.get("http://wakefield.patriotproperties.com/Summary.asp?
AccountNumber=6867")

content = website.content

soup = BeautifulSoup(content, "html.parser")

table = soup.find('table', {'class': ''})

data = soup.select("table")[0]
tab_data = [[item.text for item in row_data.select("th,td")]
            for row_data in data.select("tr")]

your interested text is at the second table, tr, td, second b tag text. You can easily change below code to what you want.

html_table = page_soup.findAll("table")[1]  # second table.
html_trs = html_table.findAll("tr")
for tr in html_trs:
    html_tds = tr.findAll("td")
    for td in html_tds:
        html_bs = td.findAll("b")
        loctext = html_bs[1].text    # second b
        loctext = loctext.lstrip()
        print("loctext=", loctext)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM