简体   繁体   English

如何比较python中的两个HTML文件并仅打印差异?

[英]How to compare two HTML files in python and print only the differences?

I have two html reports generated from sonar showing the issues in my code.我有两个从声纳生成的 html 报告,显示了我的代码中的问题。

Problem Statement: I need to compare two sonar reports and find out the differences ie new issues that got introduced.问题陈述:我需要比较两个声纳报告并找出差异,即引入的新问题。 Basically need to find differences in html and print those differences only.基本上需要找到 html 中的差异并仅打印这些差异。

I tried few things -我尝试了几件事-

import difflib
file1 = open('sonarlint-report.html', 'r').readlines()
file2 = open('sonarlint-report_latest.html', 'r').readlines()

htmlDiffer = difflib.HtmlDiff()
htmldiffs = htmlDiffer.make_file(file1, file2)

with open('comparison.html', 'w') as outfile:
    outfile.write(htmldiffs)

Now this gives me a comparison.html which is nothing but two html diff.现在这给了我一个比较.html,它只是两个 html 差异。 Doesn't print only the different lines.不只打印不同的行。

Should I try HTML parsing and then somehow get the differences only to be printed?我应该尝试 HTML 解析,然后以某种方式将差异打印出来吗? Please suggest.请建议。

If you use difflib.Differ , you can keep only the difference lines and by filtering with the two letter codes that get written on every line.如果你使用difflib.Differ ,你可以只保留不同的行,并通过在每一行上写的两个字母代码进行过滤。 From the docs :文档

class difflib.Differ类 difflib.Differ

This is a class for comparing sequences of lines of text, and producing human-readable differences or deltas.这是一个用于比较文本行序列并产生人类可读差异或增量的类。 Differ uses SequenceMatcher both to compare sequences of lines, and to compare sequences of characters within similar (near-matching) lines. Differ 使用 SequenceMatcher 来比较行序列,以及比较相似(接近匹配)行内的字符序列。

Each line of a Differ delta begins with a two-letter code: Differ delta 的每一行都以两个字母的代码开头:

Code Meaning代码含义

'- ' line unique to sequence 1 '-' 序列 1 独有的行

'+ ' line unique to sequence 2序列 2 独有的“+”行

' ' line common to both sequences ' ' 两个序列共有的行

'? '? ' line not present in either inputsequence ' 行不存在于任一输入序列中

Lines beginning with '?'以“?”开头的行attempt to guide the eye to intraline differences, and were not present in either input sequence.试图将眼睛引导到线内差异,并且在任一输入序列中都不存在。 These lines can be confusing if the sequences contain tab characters如果序列包含制表符,这些行可能会令人困惑

By keeping the lines started with '- ' and '+ ' just the differences.通过保持以 '- ' 和 '+ ' 开头的行只是差异。

I would start by trying to iterate through each html file line by line and checking to see if the lines are the same.我将首先尝试逐行遍历每个 html 文件并检查这些行是否相同。

with open('file1.html') as file1, open('file2.html') as file2:
    for file1Line, file2Line in zip(file1, file2):
        if file1Line != file2Line:
            print(file1Line.strip('\n'))
            print(file2Line.strip('\n'))

You'll have to deal with newline characters and multiple line differences in a row, but this is probably a good start :)您必须连续处理换行符和多行差异,但这可能是一个好的开始:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用python比较两个html文件并仅显示新html文件中的差异 - How to compare two html files and show only the differences in a new html file using python Python:比较两个csv文件并打印出差异 - Python : Compare two csv files and print out differences 比较两个文件以了解python中的差异 - Compare two files for differences in python 如何比较两个 csv 文件并打印所有差异 - How to compare two csv files and print all the differences 比较两个 YAML 文件中的键并打印差异? - Compare keys in two YAML files and print differences? 比较两个 CSV 文件的差异 python - Compare two CSV files for differences python Python difflib比较两个csv文件并突出显示HTML输出中的世界级差异 - Python difflib to compare two csv files and highlight the world level differences in HTML output 在python中比较两个.xlsx工作簿并将差异打印到3.工作簿中 - Compare Two .xlsx workbooks and print differences into a 3. workbook in python 如何在python中比较两个文件并打印不匹配的行号? - how to compare two files and print mismatched line number in python? 如何比较两个二进制文件或文件集并在 Python 中显示它们之间的差异? - How to compare two binary files or sets of files and displays the differences between them in Python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM