[英]Beautiful soup meta content tag
<meta itemprop="streetAddress" content="4103 Beach Bluff Rd">
I have to get the content '4103 Beach Bluff Rd'. 我必须得到内容'4103 Beach Bluff Rd'。 I'm trying to get this done with
BeautifulSoup
so, I'm trying this: 我想用
BeautifulSoup
完成这个,所以,我正在尝试这个:
soup = BeautifulSoup('<meta itemprop="streetAddress" content="4103 Beach Bluff Rd"> ')
soup.find(itemprop="streetAddress").get_text()
but I'm getting an empy string as result, which may have sense given that when a print the soup object 但是我得到一个empy字符串作为结果,这可能有意义,因为当打印汤对象
print soup
I get the this: 我明白了:
<html><head><meta content="4103 Beach Bluff Rd" itemprop="streetAddress"/> </head></html>
Apparently the data I want is in the 'meta content' tag, how can I get this data? 显然,我想要的数据是在“元内容”标签中,我该如何获取这些数据?
soup.find(itemprop="streetAddress").get_text()
You are getting the text of a matched element. 您将获得匹配元素的文本。 Instead, get the "content" attribute value :
相反, 获取“content”属性值 :
soup.find(itemprop="streetAddress").get("content")
This is possible since BeautifulSoup
provides a dictionary-like interface to tag attributes : 这是可能的,因为
BeautifulSoup
为标记属性提供了类似字典的界面 :
You can access a tag's attributes by treating the tag like a dictionary.
您可以通过将标记视为字典来访问标记的属性。
Demo: 演示:
>>> from bs4 import BeautifulSoup
>>>
>>> soup = BeautifulSoup('<meta itemprop="streetAddress" content="4103 Beach Bluff Rd"> ')
>>> soup.find(itemprop="streetAddress").get_text()
u''
>>> soup.find(itemprop="streetAddress").get("content")
'4103 Beach Bluff Rd'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.