简体   繁体   English

替换Python中HTML字符串的文本部分中包含的某个字符

[英]Replace a certain character contained in text part of an HTML string in Python

I have a string which is valid HTML like 我有一个有效的HTML字符串,例如

s = """<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>"""

I want to replace a certain character, say a in this string with x , the condition being only the a occuring in inner text of HTML are to be replaced, and any a that are part of markup tags or values should not be replaced. 我想更换一个特定字符,比如a在此字符串x ,病情暂时只有a在HTML的内部文本发生的历史将被更换,任何a是标记标签或值的一部分,不应该被更换。

I tried using BeautifulSoup and its get_text() method, but that doesn't solve my purpose. 我尝试使用BeautifulSoup及其get_text()方法,但这不能解决我的目的。 Is there a way I can achieve this in Python? 有什么办法可以在Python中实现吗?

You can use BeautifulSoup to give you a list of all of the text elements within the document. 您可以使用BeautifulSoup为您提供文档中所有文本元素的列表。 For each of these you can then make use of the replace_with() function to replace the NavigableString object with an updated version, in your case one with the necessary characters replaced: 然后,对于其中的每一个,您都可以使用replace_with()函数将NavigableString对象替换为更新版本,在您的情况下,将替换为必需的字符:

from bs4 import BeautifulSoup, NavigableString

s = """<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>"""

soup = BeautifulSoup(s, "html.parser")

for text in list(soup.strings):
    text.replace_with(NavigableString(text.replace('a', 'x')))

print(soup)    

So replacing all a characters with x would give you: 因此,替换所有a与人物x会给你:

<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon x time there were three little sisters; xnd their nxmes were
<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lxcie</a> xnd
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
xnd they lived xt the bottom of x well.</p>
<p class="story">...</p></body></html>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM