简体   繁体   中英

Python CSV Reader - Compare Each Row with Each Other Row Within One Column

I want to compare each row of a CSV file with itself and every other row within a column.
For example, if the column values are like this:

Value_1
Value_2
Value_3

The code should pick Value_1 and compare it with Value_1 (yes, with itself too), Value_2 and then with Value_3. Then it should pick up Value_2 and compare it with Value_1, Value_2, Value_3, and so on.

I've written following code for this purpose:

csvfile = "c:\temp\temp.csv"
with open(csvfile, newline='') as f:
    reader = csv.reader(f, delimiter=',')
    for row in reader:
        for compare_row in reader:
            if row == compare_row
                print(row,'is equal to',compare_row)
            else:
                print(row,'is not equal to',compare_row)

The code gives the following output:

['Value_1'] is not equal to ['Value_2']
['Value_1'] is not equal to ['Value_3']

The code compares Value_1 to Value_2 and Value_3 and then stops. Loop 1 does not pick Value_2, and Value_3. In short, the first loop appears to iterate over only the first row of the CSV file before stopping.

Also, I can't compare Value_1 to itself using this code. Any suggestions for the solution?

I would have suggested loading the CSV into memory but this is not an option considering the size.

Instead think of it like a SQL statement, for every row in the left table you want to match it to a value in the right table. So you would only scan through the left table once and start re-scanning the right table until left has reached EoF.

with open(csvfile, newline='') as f_left:
    reader_left = csv.reader(f_left, delimiter=',')
    with open(csvfile, newline='') as f_right:
        reader_right = csv.reader(f_right, delimiter=',')
        for row in reader_left:
            for compare_row in reader_right:
                if row == compare_row:
                    print(row,'is equal to',compare_row)
                else:
                    print(row,'is not equal to',compare_row)
            f_right.seek(0)

Try to use inbuilt package from Python : Itertools

from itertools import product

with open("abcTest.txt") as inputFile:
    aList = inputFile.read().split("\n")
    aProduct = product(aList,aList)
    for aElem,bElem in aProduct:
        if aElem == bElem:
            print aElem,'is equal to',bElem
        else:
            print aElem,'is not equal to',bElem

The problem you are facing is called Cartesian product in Python where we need to compare the row of data with itself and every other row.

For this if you are doing multiple time read from source then it will cause signficant performance issue if the file is big. Instead you can store the the data in list and iterate it over multiple time but this also will have huge performance over head.

The itertool package is useful in this case as it is optimized for these kind of problems.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM