BeautifulSoup trying to remove HTML data from list

Question

As mentioned above, I am trying to remove HTML from the printed output to just get text and my dividing | and -. I get span information as well as others that I would like to remove. As it is part of the program that is a loop, I cannot search for the individual text information of the page as they change. The page architecture stays the same, which is why printing the items in the list stays the same. Wondering what would be the easiest way to clean the output. Here is the code section:

        infoLink = driver.find_element_by_xpath("//a[contains(@href, '?tmpl=component&detail=true&parcel=')]").click()
        driver.switch_to.window(driver.window_handles[1])
        aInfo = driver.current_url
        data = requests.get(aInfo)
        src = data.text
        soup = BeautifulSoup(src, "html.parser")
        parsed = soup.find_all("td")
        for item in parsed:
            Original = (parsed[21])
            Owner = parsed[13]
            Address = parsed[17]
            print (*Original, "|",*Owner, "-",*Address)

Example output is:

<span class="detail-text">123 Main St</span> | <span class="detail-text">Banner,Bruce</span> - <span class="detail-text">1313 Mockingbird Lane<br>Santa Monica, CA  90405</br></span>

Thank you!

Answer 1

To get the text between the tags just use get_text() but you should be aware, that there is always text between the tags to avoid errors:

for item in parsed:
    Original = (parsed[21].get_text(strip=True))
    Owner = parsed[13].get_text(strip=True)
    Address = parsed[17].get_text(strip=True)

Answer 2

I wrote an algorithm recently that does something like this. It won't work if your target text has a < or a > in it, though.

def remove_html_tags(string):
    data = string.replace(string[string.find("<"):string.find(">") + 1], '').strip()
    if ">" in data or "<" in data:
        return remove_html_tags(data)
    else:
        return str(data)

It recursively removes the text between < and > , inclusive.

Let me know if this works!

BeautifulSoup trying to remove HTML data from list

Question

2 answers

solution1
0 ACCPTED 2021-02-05 17:40:52

solution2
0 2021-02-05 17:41:10

BeautifulSoup trying to remove HTML data from list

Question

2 answers

solution1 0 ACCPTED 2021-02-05 17:40:52

solution2 0 2021-02-05 17:41:10

solution1
0 ACCPTED 2021-02-05 17:40:52

solution2
0 2021-02-05 17:41:10