Grabbing meta content with beautiful soup

Question

Do I need to use regex here?

The content I want looks like:

<meta content="text I want to grab" name="description"/>

However, there are many objects that start with "meta content=" I want the one that ends in name="description". I'm pretty new at regex, but I thought BS would be able to handle this.

Answer 1

Assuming you were able read the HTML contents into a variable and named the variable html , you have to parse the HTML using beautifulsoup:

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')

Then, to search for <meta content="text I want to grab" name="description"/> , you have to find a tag with name 'meta' and attribute name='description' :

def is_meta_description(tag):
    return tag.name == 'meta' and tag['name'] == 'description'

meta_tag = soup.find(is_meta_description)

You are trying to fetch the content attribute of the tag, so:

content = meta_tag['content']

Since it is a simple search, there is also a simpler way to find the tag:

meta_tag = soup.find('meta', attrs={'name': 'description'})

Grabbing meta content with beautiful soup

Question

1 answers

solution1
3 ACCPTED 2018-08-24 01:21:47

Grabbing meta content with beautiful soup

Question

1 answers

solution1 3 ACCPTED 2018-08-24 01:21:47

solution1
3 ACCPTED 2018-08-24 01:21:47