简体   繁体   English

查找和替换HTML中的字符串

[英]Find and replace strings in HTML

From this HTML code: 通过此HTML代码:

<p class="description" dir="ltr">Name is a fine man. <br></p>

I'm looking for replacing "Name" using the following code: 我正在寻找使用以下代码替换“名称”:

target = soup.find_all(text="Name")
for v in target:
    v.replace_with('Id')

The output I would like to have is: 我想要的输出是:

<p class="description" dir="ltr">Id is a fine man. <br></p>

When I: 当我:

print target
[]

Why doesn't it find the "Name"? 为什么找不到“名称”?

Thanks! 谢谢!

The text node in your HTML contains some other text besides "Name" . HTML中的文本节点除"Name"之外还包含其他一些文本。 In this case, you need to relax search criteria to use contains instead of exact match , for example, by using regex. 在这种情况下,您需要放宽搜索条件以使用包含而不是完全匹配 ,例如,使用正则表达式。 Then you can replace matched text nodes with the original text except for "Name" part should be replaced with "Id" by using simple string.replace() method, for example : 然后,您可以使用简单的string.replace()方法将匹配的文本节点替换为原始文本,但"Name"部分应替换为"Id" ,例如:

from bs4 import BeautifulSoup
import re

html = """<p class="description" dir="ltr">Name is a fine man. <br></p>"""
soup = BeautifulSoup(html)
target = soup.find_all(text=re.compile(r'Name'))
for v in target:
    v.replace_with(v.replace('Name','Id'))
print soup

output : 输出:

<html><body><p class="description" dir="ltr">Id is a fine man. <br/></p></body></html>

It returns an empty list because searching for text like this must match the whole text in a tag, so use regular expression instead. 它返回一个空列表,因为搜索这样的文本必须与标记中的整个文本匹配,因此请改用正则表达式。

From the official docs: BeautifulSoup - Search text 来自官方文档: BeautifulSoup-搜索文本

text is an argument that lets you search for NavigableString objects instead of Tags. text是一个参数,可让您搜索NavigableString对象而不是Tag。 Its value can be a string, a regular expression, a list or dictionary, True or None, or a callable that takes a NavigableString object as its argument: 它的值可以是字符串,正则表达式,列表或字典,True或None或以NavigableString对象为参数的可调用对象:

soup.findAll(text="one")
# [u'one']
soup.findAll(t ext=re.compile("paragraph"))
# [u'This is paragraph ', u'This is paragraph ']
soup.findAll(text=lambda(x): len(x) < 12)
# [u'Page title', u'one', u'.', u'two', u'.']

PS: Already already discussed answers are here and here . PS:已经在这里这里讨论过答案

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM