简体   繁体   中英

Comparing a list of different sizes and data to output the difference

I am working on a program for a personal project to better understand how list and dictionaries in python work. I am an amateur programmer still learning. The programs goal is to be able to read two files and compare the parameters of these two files with one another, if the parameter of one of the files is incorrect or does NOT match it will create a new file with the incorrect/Not matched parameters.

I have already created this and the program does what it's suppose to. However, I am running into an error when trying to compare a file that has more or less parameters than the compared file. In short, my lists that were being compared with one another had the same number of elements; however if the elements of the list are not equal I run into an error, usually a list index out of range.

The gist of it, as best as I could put it, is the following: I have 2 Text Documents:
TextA.txt:

Data1="123.212.2.312"
Dog=12
Cat="127.0.0.1"
Data2=9498
Fish=""
Tiger=9495
Data3=5
Data4=2
Game=55
Tree=280
Falcon=67
Bear=2

TextB.txt:

Dog=123
Cat="127.0.0.1"
Data2=9498
Eagle=""
Tiger=9495
Data3=5
Data4=2
Rock=52
Mountain=380
Falcon=627

As we can see there are missing parameters from both Text documents and some of the parameters that are in both are incorrect so I would like to output the differences from textA.txt ONLY into another text document.

So the program would do the following course of action: (This is currently how the program works when comparing two texts with the same number of parameters please take this flow chart with a grain of salt its not meant to fully represent the program just give a general idea of how the program works) : 流程图

So in the end my output should be:

outputext

Remember I do not care if a parameter exists in TextB.txt but not in textA.txt; what I care is that if a parameter exists in textA.txt and NOT in textB.txt Confusing I know but hopefully the picture will clear things up.

As for my code it's a very long piece of code but the important parts are the following: Please note I am also using PYQT4 for the gui.

with open(compareResults, 'wb') as fdout:
            for index, tabName in enumerate(setNames):
                tabWidget = QtGui.QWidget()
                tabLabel = QtGui.QTextEdit()
                print "Tab Name is :{}".format(tabName)
                fdout.write('{}'.format(tabName) + '\r\n')
                nameData = lst[index]
                print 'name data = {}'.format(nameData)
                for k in nameData:
                    if nameData[k] != correct_parameters[k]:
                        tabLabel.setTextColor(QtGui.QColor("Red"))
                        tabLabel.append('This Parameter is Incorrect: {} = {}'.format(k, nameData[k]))
                        fdout.write('\t' + '|' + 'This Parameter is Incorrect: {} = {}'.format(k, nameData[k]) + '\t' + '|' + '\r\n')
                        print ('{} = {}'.format(k, nameData[k]))
                    elif nameData[k] == correct_parameters[k]:
                        tabLabel.setTextColor(QtGui.QColor("Black"))
                        tabLabel.append('{} = {}'.format(k, nameData[k]))
                        fdout.write('\t' + '|' + '{} = {}'.format(k, nameData[k]) + '\t' + '|' + '\r\n')
                        print ('{} = {}'.format(k, nameData[k]))
                tabLayout = QtGui.QVBoxLayout()
                tabLayout.addWidget(tabLabel)
                tabWidget.setLayout(tabLayout)
                self.tabWidget.addTab(tabWidget, tabName)

I believe my downfall with the code is that I am looping through a set number of elements and expecting the same number of elements when looping through both lists. How would I be able to Loop through the lists when they do not have the same number of elements?

If the question is too confusing or you need more information/code please let me know and I will edit the question.

EDIT: Just to clarify I ended up using @CarsonCrane 's answer because it helped me create the loop that I needed, This is what my code looks like now:

for k in nameData:
    if k in correct_parameters:
        if nameData[k] != correct_parameters[k]:
            tabLabel.setTextColor(QtGui.QColor("Red"))
            tabLabel.append('This Parameter is Incorrect: {} = {}'.format(k, nameData[k]))
            fdout.write('\t' + '|' + 'This Parameter is Incorrect: {} = {}'.format(k, nameData[k]) + '\t' + '|' + '\r\n')
            print ('{} = {}'.format(k, nameData[k]))
        elif nameData[k] == correct_parameters[k]:
            tabLabel.setTextColor(QtGui.QColor("Black"))
            tabLabel.append('{} = {}'.format(k, nameData[k]))
            fdout.write('\t' + '|' + '{} = {}'.format(k, nameData[k]) + '\t' + '|' + '\r\n')
            print ('{} = {}'.format(k, nameData[k]))
    else:
        tabLabel.setTextColor(QtGui.QColor("Blue"))
        tabLabel.append('{} = {} does not appear in our default'.format(k, nameData[k]))
        fdout.write('\t' + '|' + '{} = {} does not appear in our default'.format(k, nameData[k]) + '\t' + '|' + '\r\n')
        print ('{} = {} does not appear in our default'.format(k, nameData[k]))

Create two dictionaries and parse the values of your file into respective keys and values of the dictionary. Loop through the first dictionary and compare the values.

d1 = {"Tiger":9495, "Data3":5, "Data4":2}
d2 = {"Tiger":94, "Data4":2}

for key, value in d1.items():
    if key in d2:
        if value == d2[key]:
            #same thing
        else:
            #different
    else:
        #d2 doesn't have key

The underlying problem in your program is the problem of finding the difference between two sequences, which is a derivation of the Longest Common Subsequence Problem (LCS, see https://en.wikipedia.org/wiki/Longest_common_subsequence_problem ). Its solution is not straightforward. In Python you can use difflib library to deal with this kind of problems.

# Assuming you have already parsed the files into two lists

from difflib import SequenceMatcher, Differ
params1 = [
    ('Data1', '123.212.2.312'),
    ('Dog', 12),
    ('Cat', '127.0.0.1'),
    ('Data2', 9498),
    ('Fish', ''),
    ('Tiger', 9495),
    ('Data3', 5),
    ('Data4', 2),
    ('Game', 55),
    ('Tree', 280),
    ('Falcon', 67),
    ('Bear', 2)
]
params2 = [
    ('Dog', 123),
    ('Cat', '127.0.0.1'),
    ('Data2', 9498),
    ('Eagle', ''),
    ('Tiger', 9495),
    ('Data3', 5),
    ('Data4', 2),
    ('Rock', 52),
    ('Mountain', 380),
    ('Falcon', 627)
]

# If the order of the entries in the file is not mandatory you could sort the lists
matcher = SequenceMatcher(None, params1, params2)
if matcher.ratio() != 1:
    print 'Sequences are not equal'
    print list(Differ().compare(params1, params2)) # Prints the difference

You can also get the operations that transform params1 in params2 with:

matcher.get_opcodes()

or the matching blocks with:

matcher.get_matching_blocks()

With this data you just have to do a little work to show the difference on the screen.

You'll benefit pretty considerably from a couple of things. First off, you can quickly create dictionaries from each file using something like this:

d1 = dict(l.strip().split('=') for l in open('file1.txt'))

That's going to give you a much cleaner way of accessing your individual values. Next, when comparing the two dictionaries, something like this is pretty reasonable:

for key, value in d1.items():
  if key not in d2:
    print "Key '%s' from d1 not in d2" % key
    continue
  value2 = d2[key]
  # other comparison / output code here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM