简体   繁体   中英

How to get meta tag value with BeautifulSoup soup.select

I want to extract date from one html tag. I'm using Python and Beautiful Soup.

<meta name="Email" content="info@info.de">
<meta name="Date" content="2021-04-28T20:35:00+02:00">
<meta name="title" content="Tris is tite">

I want to extract only date, so this should be result: 2021-04-28T20:35:00+02:00

I know that I can do it like this:

tag = "meta['Date']"
date = soup.select(tag)
date = date['content']

But is is possible to do that only with one css selector, only with tag value? For example, something like this?

tag = "meta['Date']['content']" # or something like this?
date = soup.select(tag)
print(date)
2021-04-28T20:35:00+02:00

PS I have to use soup.select , soup.find(...) and soup.select_one does not work for me. So only soup.select works!

You are almost there. It seems you are confusing the attribute names. Try:

tag = "meta[content]" 
date = soup.select_one(tag)
print(date.get('content'))

Output:

2021-04-28T20:35:00+02:00

Edit: Change the tag line to:

tag = "meta[content][name='Date']" 

You can simply use:

>>> from bs4 import BeautifulSoup
>>> content = """<meta name="Date" content="2021-04-28T20:35:00+02:00"> """
>>> soup = BeautifulSoup(content, 'html.parser')
>>> soup.find("meta", {"name": "Date"}).attrs['content']
>>> '2021-04-28T20:35:00+02:00'

If you want to extract all 'meta' tags and display the Date properties using 'select':

>>> for item in soup.select("meta"):
...     print(item.attrs.get('content'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM