简体   繁体   English

美丽的汤元内容标记

[英]Beautiful soup meta content tag

<meta itemprop="streetAddress" content="4103 Beach Bluff Rd"> 

I have to get the content '4103 Beach Bluff Rd'. 我必须得到内容'4103 Beach Bluff Rd'。 I'm trying to get this done with BeautifulSoup so, I'm trying this: 我想用BeautifulSoup完成这个,所以,我正在尝试这个:

soup = BeautifulSoup('<meta itemprop="streetAddress" content="4103 Beach Bluff Rd"> ')

soup.find(itemprop="streetAddress").get_text()

but I'm getting an empy string as result, which may have sense given that when a print the soup object 但是我得到一个empy字符串作为结果,这可能有意义,因为当打印汤对象

print soup

I get the this: 我明白了:

<html><head><meta content="4103 Beach Bluff Rd" itemprop="streetAddress"/> </head></html>

Apparently the data I want is in the 'meta content' tag, how can I get this data? 显然,我想要的数据是在“元内容”标签中,我该如何获取这些数据?

soup.find(itemprop="streetAddress").get_text()

You are getting the text of a matched element. 您将获得匹配元素的文本。 Instead, get the "content" attribute value : 相反, 获取“content”属性值

soup.find(itemprop="streetAddress").get("content")

This is possible since BeautifulSoup provides a dictionary-like interface to tag attributes : 这是可能的,因为BeautifulSoup标记属性提供了类似字典的界面

You can access a tag's attributes by treating the tag like a dictionary. 您可以通过将标记视为字典来访问标记的属性。

Demo: 演示:

>>> from bs4 import BeautifulSoup
>>>
>>> soup = BeautifulSoup('<meta itemprop="streetAddress" content="4103 Beach Bluff Rd"> ')
>>> soup.find(itemprop="streetAddress").get_text()
u''
>>> soup.find(itemprop="streetAddress").get("content")
'4103 Beach Bluff Rd'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM