How do you search for a string in a BeautifulSoup object?

Question

I am checking Craigslist postings to see if they have been flagged for removal. My script is pretty simple:

import requests
from bs4 import BeautifulSoup

def check_if_flagged(url):
    page = requests.get (url)
    soup = BeautifulSoup(page.content, 'html.parser')
    return ('flagged for removal' in soup)

The problem is, I have a url which I know for a fact has been flagged for removal, but check_if_flagged is returning False . Is this the correct way to search a BeautifulSoup object for a substring? Is there a more optimal way? Please let me know if you are reproducing this error.

Here is the url for reference: 'https://newyork.craigslist.org/brk/apa/d/brooklyn-1-bedroom-1-bath-apt-located/7206865558.html'

Answer 1

To search for the text in soup, you can use text= property. Or you can just search returned HTML code as string:

import requests
from bs4 import BeautifulSoup

def check_if_flagged(url):
    page = requests.get(url).text
    return 'this posting has been flagged for removal' in page.lower()

def check_if_flagged2(url):
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')
    return bool(soup.find(text=lambda t: 'this posting has been flagged for removal' in t.lower()))

url = 'https://newyork.craigslist.org/brk/apa/d/brooklyn-1-bedroom-1-bath-apt-located/7206865558.html'
print(check_if_flagged(url))
print(check_if_flagged2(url))

Prints:

True
True

How do you search for a string in a BeautifulSoup object?

Question

1 answers

solution1
0 2020-10-27 17:47:57

How do you search for a string in a BeautifulSoup object?

Question

1 answers

solution1 0 2020-10-27 17:47:57

solution1
0 2020-10-27 17:47:57