替换Python中HTML字符串的文本部分中包含的某个字符

Question

I have a string which is valid HTML like 我有一个有效的HTML字符串，例如

s = """<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>"""

I want to replace a certain character, say a in this string with x , the condition being only the a occuring in inner text of HTML are to be replaced, and any a that are part of markup tags or values should not be replaced. 我想更换一个特定字符，比如a在此字符串x ，病情暂时只有a在HTML的内部文本发生的历史将被更换，任何a是标记标签或值的一部分，不应该被更换。

I tried using BeautifulSoup and its get_text() method, but that doesn't solve my purpose. 我尝试使用BeautifulSoup及其get_text()方法，但这不能解决我的目的。 Is there a way I can achieve this in Python? 有什么办法可以在Python中实现吗？

Answer 1

You can use BeautifulSoup to give you a list of all of the text elements within the document. 您可以使用BeautifulSoup为您提供文档中所有文本元素的列表。 For each of these you can then make use of the replace_with() function to replace the NavigableString object with an updated version, in your case one with the necessary characters replaced: 然后，对于其中的每一个，您都可以使用replace_with()函数将NavigableString对象替换为更新版本，在您的情况下，将替换为必需的字符：

from bs4 import BeautifulSoup, NavigableString

s = """<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>"""

soup = BeautifulSoup(s, "html.parser")

for text in list(soup.strings):
    text.replace_with(NavigableString(text.replace('a', 'x')))

print(soup)

So replacing all a characters with x would give you: 因此，替换所有a与人物x会给你：

<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon x time there were three little sisters; xnd their nxmes were
<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lxcie</a> xnd
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
xnd they lived xt the bottom of x well.</p>
<p class="story">...</p></body></html>

替换Python中HTML字符串的文本部分中包含的某个字符

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-11-02 09:53:39

替换Python中HTML字符串的文本部分中包含的某个字符

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-11-02 09:53:39

解决方案1
0 已采纳 2018-11-02 09:53:39