抓住美丽的汤元内容

Question

Do I need to use regex here? 我需要在这里使用正则表达式吗？

The content I want looks like: 我想要的内容如下所示：

<meta content="text I want to grab" name="description"/>

However, there are many objects that start with "meta content=" I want the one that ends in name="description". 但是，有许多以“ meta content =“开头的对象，我想要以name =” description“结尾的对象。 I'm pretty new at regex, but I thought BS would be able to handle this. 我是regex的新手，但我认为BS可以解决这个问题。

Answer 1

Assuming you were able read the HTML contents into a variable and named the variable html , you have to parse the HTML using beautifulsoup: 假设您能够将HTML内容读入一个变量并将其命名为html ，则必须使用beautifulsoup解析HTML：

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')

Then, to search for <meta content="text I want to grab" name="description"/> , you have to find a tag with name 'meta' and attribute name='description' : 然后，要搜索<meta content="text I want to grab" name="description"/> ，必须找到名称为'meta'且属性name='description'的标签：

def is_meta_description(tag):
    return tag.name == 'meta' and tag['name'] == 'description'

meta_tag = soup.find(is_meta_description)

You are trying to fetch the content attribute of the tag, so: 您正在尝试获取标签的content属性，因此：

content = meta_tag['content']

Since it is a simple search, there is also a simpler way to find the tag: 由于这是一个简单的搜索，因此还有一种更简单的方法来找到标签：

meta_tag = soup.find('meta', attrs={'name': 'description'})

抓住美丽的汤元内容

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-08-24 01:21:47

抓住美丽的汤元内容

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-08-24 01:21:47

解决方案1
3 已采纳 2018-08-24 01:21:47