简体   繁体   English

如何使用python来区分两个html文件

[英]how to using python to diff two html files

i want use python to diff two html files: 我想使用python来区分两个html文件:

example : 例如:

html_1 = """
<p>i love it</p>
"""
html_2 = """ 
<h2>i love it </p>
"""

the diff file will like this : diff文件会是这样的:

diff_html = """
<del><p>i love it</p></dev><ins><h2>i love it</h2></ins>
"""

is there such python lib help me do this ? 有没有这样的python lib帮我这么做?

lxml can do something similar to what you want. lxml可以做你想做的事情。 From the docs: 来自文档:

>>> from lxml.html.diff import htmldiff
>>> doc1 = '''<p>Here is some text.</p>'''
>>> doc2 = '''<p>Here is <b>a lot</b> of <i>text</i>.</p>'''
>>> print htmldiff(doc1, doc2)
<p>Here is <ins><b>a lot</b> of <i>text</i>.</ins> <del>some text.</del> </p>

I don't know of any other Python library for this specific task, but you may want to look into word-by-word diffs. 我不知道这个特定任务的任何其他Python库,但你可能想要研究逐字的差异。 They may approximate what you want. 他们可能接近你想要的。

One example is this one , implemented in both PHP and Python (save it as diff.py , then import diff ) 一个例子是这个 ,用PHP和Python实现(保存为diff.py ,然后import diff

>>> diff.htmlDiff(a,b)
>>> '<del><p>i</del> <ins><h2>i</ins> love <del>it</p></del> <ins>it </p></ins>'

i fount two python lib that's helpfull: 我配置了两个有用的python库:

  1. htmltreediff htmltreediff
  2. htmldiff htmldiff

but , both of it use python's difflib lib to diff text. 但是,它们都使用python的difflib lib来区分文本。 but i want to use google's diff . 但我想使用谷歌的差异。

Checkout diff2HtmlCompare (full disclosure: I'm the author). 结帐diff2HtmlCompare (完全披露:我是作者)。 If you're trying to just visualize the differences, then this may help you. 如果您试图想象差异,那么这可能会对您有所帮助。 If you are trying to extract the differences and do something with it, then you can use difflib as suggested by others (the script above just wraps difflib and uses pygments for syntax highlighting). 如果您尝试提取差异并对其执行某些操作,则可以按照其他人的建议使用difflib(上面的脚本只包含difflib并使用pygments进行语法突出显示)。 Doug Hellmann has done a pretty good job detailing how to use difflib, I'd suggest checking out his tutorial . Doug Hellmann在详细介绍如何使用difflib方面做得非常好,我建议查看他的教程

You could use difflib.ndiff() to look for and replace the " - "/" + " with your desired HTML. 您可以使用difflib.ndiff()查找并用您想要的HTML替换“ - ”/“ + ”。

import difflib

html_1 = """
<p>i love it</p>
"""
html_2 = """
<h2>i love it </p>
"""

diff_html = ""
theDiffs = difflib.ndiff(html_1.splitlines(), html_2.splitlines())
for eachDiff in theDiffs:
    if (eachDiff[0] == "-"):
        diff_html += "<del>%s</del>" % eachDiff[1:].strip()
    elif (eachDiff[0] == "+"):
        diff_html += "<ins>%s</ins>" % eachDiff[1:].strip()

print diff_html

The result: 结果:

<del><p>i love it</p></del><ins><h2>i love it </p></ins>

AFAIK,python有一个difflib构建,可以做到这一点。

Not exactly what your output is, but the standard library difflib has a simple htmldiff tool in it, which will build a html diff table for you. 不完全是你的输出,但标准库difflib有一个简单的htmldiff工具,它将为你构建一个html差异表。

import difflib

html_1 = """
<p>i love it</p>
"""
html_2 = """ 
<h2>i love it </p>
"""

htmldiff = difflib.HtmlDiff()
html_table = htmldiff.make_table([html_1], [html_2]) # each item is a list of lines

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM