简体   繁体   中英

How to compare two HTML files in python and print only the differences?

I have two html reports generated from sonar showing the issues in my code.

Problem Statement: I need to compare two sonar reports and find out the differences ie new issues that got introduced. Basically need to find differences in html and print those differences only.

I tried few things -

import difflib
file1 = open('sonarlint-report.html', 'r').readlines()
file2 = open('sonarlint-report_latest.html', 'r').readlines()

htmlDiffer = difflib.HtmlDiff()
htmldiffs = htmlDiffer.make_file(file1, file2)

with open('comparison.html', 'w') as outfile:
    outfile.write(htmldiffs)

Now this gives me a comparison.html which is nothing but two html diff. Doesn't print only the different lines.

Should I try HTML parsing and then somehow get the differences only to be printed? Please suggest.

If you use difflib.Differ , you can keep only the difference lines and by filtering with the two letter codes that get written on every line. From the docs :

class difflib.Differ

This is a class for comparing sequences of lines of text, and producing human-readable differences or deltas. Differ uses SequenceMatcher both to compare sequences of lines, and to compare sequences of characters within similar (near-matching) lines.

Each line of a Differ delta begins with a two-letter code:

Code Meaning

'- ' line unique to sequence 1

'+ ' line unique to sequence 2

' ' line common to both sequences

'? ' line not present in either inputsequence

Lines beginning with '?'attempt to guide the eye to intraline differences, and were not present in either input sequence. These lines can be confusing if the sequences contain tab characters

By keeping the lines started with '- ' and '+ ' just the differences.

I would start by trying to iterate through each html file line by line and checking to see if the lines are the same.

with open('file1.html') as file1, open('file2.html') as file2:
    for file1Line, file2Line in zip(file1, file2):
        if file1Line != file2Line:
            print(file1Line.strip('\n'))
            print(file2Line.strip('\n'))

You'll have to deal with newline characters and multiple line differences in a row, but this is probably a good start :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM