查找和替换HTML中的字符串

Question

通过此HTML代码：

<p class="description" dir="ltr">Name is a fine man. <br></p>

我正在寻找使用以下代码替换“名称”：

target = soup.find_all(text="Name")
for v in target:
    v.replace_with('Id')

我想要的输出是：

<p class="description" dir="ltr">Id is a fine man. <br></p>

当我：

print target
[]

为什么找不到“名称”？

谢谢！

Answer 1

HTML中的文本节点除"Name"之外还包含其他一些文本。 在这种情况下，您需要放宽搜索条件以使用包含而不是完全匹配 ，例如，使用正则表达式。 然后，您可以使用简单的string.replace()方法将匹配的文本节点替换为原始文本，但"Name"部分应替换为"Id" ，例如：

from bs4 import BeautifulSoup
import re

html = """<p class="description" dir="ltr">Name is a fine man. <br></p>"""
soup = BeautifulSoup(html)
target = soup.find_all(text=re.compile(r'Name'))
for v in target:
    v.replace_with(v.replace('Name','Id'))
print soup

输出：

<html><body><p class="description" dir="ltr">Id is a fine man. <br/></p></body></html>

Answer 2

它返回一个空列表，因为搜索这样的文本必须与标记中的整个文本匹配，因此请改用正则表达式。

来自官方文档： BeautifulSoup-搜索文本

text是一个参数，可让您搜索NavigableString对象而不是Tag。 它的值可以是字符串，正则表达式，列表或字典，True或None或以NavigableString对象为参数的可调用对象：

soup.findAll(text="one")
# [u'one']
soup.findAll(t ext=re.compile("paragraph"))
# [u'This is paragraph ', u'This is paragraph ']
soup.findAll(text=lambda(x): len(x) < 12)
# [u'Page title', u'one', u'.', u'two', u'.']

PS：已经在这里和这里讨论过答案了。

查找和替换HTML中的字符串

问题描述

2 个解决方案

解决方案1
7 已采纳 2015-07-04 12:02:50

解决方案2
1 2015-07-04 12:38:53

查找和替换HTML中的字符串

问题描述

2 个解决方案

解决方案1 7 已采纳 2015-07-04 12:02:50

解决方案2 1 2015-07-04 12:38:53

解决方案1
7 已采纳 2015-07-04 12:02:50

解决方案2
1 2015-07-04 12:38:53