[英]Replace a certain character contained in text part of an HTML string in Python
I have a string which is valid HTML like 我有一个有效的HTML字符串,例如
s = """<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>"""
I want to replace a certain character, say a
in this string with x
, the condition being only the a
occuring in inner text of HTML are to be replaced, and any a
that are part of markup tags or values should not be replaced. 我想更换一个特定字符,比如
a
在此字符串x
,病情暂时只有a
在HTML的内部文本发生的历史将被更换,任何a
是标记标签或值的一部分,不应该被更换。
I tried using BeautifulSoup and its get_text()
method, but that doesn't solve my purpose. 我尝试使用BeautifulSoup及其
get_text()
方法,但这不能解决我的目的。 Is there a way I can achieve this in Python? 有什么办法可以在Python中实现吗?
You can use BeautifulSoup to give you a list of all of the text elements within the document. 您可以使用BeautifulSoup为您提供文档中所有文本元素的列表。 For each of these you can then make use of the
replace_with()
function to replace the NavigableString
object with an updated version, in your case one with the necessary characters replaced: 然后,对于其中的每一个,您都可以使用
replace_with()
函数将NavigableString
对象替换为更新版本,在您的情况下,将替换为必需的字符:
from bs4 import BeautifulSoup, NavigableString
s = """<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>"""
soup = BeautifulSoup(s, "html.parser")
for text in list(soup.strings):
text.replace_with(NavigableString(text.replace('a', 'x')))
print(soup)
So replacing all a
characters with x
would give you: 因此,替换所有
a
与人物x
会给你:
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon x time there were three little sisters; xnd their nxmes were
<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lxcie</a> xnd
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
xnd they lived xt the bottom of x well.</p>
<p class="story">...</p></body></html>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.