Python 查找和更换美汤

Question

I am using Beautiful Soup to replace the occurrences of a pattern with a href link inside a HTML file我正在使用 Beautiful Soup 将出现的模式替换为 HTML 文件中的 href 链接

I am facing a problem as described below我面临如下所述的问题

modified_contents = re.sub("([^http://*/s]APP[a-z]{2}[0-9]{2})", "<a href=\"http://stack.com=\\1\">\\1</a>", str(soup))

Sample input 1:样本输入 1：

Input File contains APPdd34

Output File contains <a href="http://stack.com=APPdd34"> APPdd34</a>

Sample input 2:样本输入 2：

Input File contains <a href="http://stack.com=APPdd34"> APPdd34</a>

Output File contains <a href="http://stack.com=<a href="http://stack.com=APPdd34"> APPdd34</a>"> <a href="http://stack.com=APPdd34"> APPdd34</a></a>

Desired Output File 2 is same as Sample Input File 2.所需的 Output 文件 2 与示例输入文件 2 相同。

How can I rectify this problem?我该如何解决这个问题？

Answer 1

This may not entirely answer your problem because I don't know an entire input file could look like, but I hope this is a direction you can take.这可能无法完全回答您的问题，因为我不知道整个输入文件可能是什么样子，但我希望这是您可以采取的方向。

from BeautifulSoup import BeautifulSoup, Tag
text = """APPdd34"""
soup = BeautifulSoup(text)
var1 = soup.text
text = """&lt;a href="http://stack.com=APPdd34"&gt; APPdd34&lt;/a&gt;"""
soup = BeautifulSoup(text)
var2 = soup.find('a').text

soup = BeautifulSoup("&lt;p>Some new html&lt;/p&gt;")
tag1 = Tag(soup, "a",{'href':'http://stack.com='+var1,})
tag1.insert(0,var1) # Insert text
tag2 = Tag(soup, "a",{'href':'http://stack.com='+var2,})
tag2.insert(0,var2)
soup.insert(0,tag1)
soup.insert(3,tag2)
print soup.prettify()

So basically, just use BeautifulSoup to extract the text and then you can build Tags from there.所以基本上，只需使用 BeautifulSoup 来提取文本，然后您就可以从那里构建标签。

Python 查找和更换美汤

问题描述

1 个解决方案

解决方案1
0 2011-10-03 16:09:23

Python 查找和更换美汤

问题描述

1 个解决方案

解决方案1 0 2011-10-03 16:09:23

解决方案1
0 2011-10-03 16:09:23