在PYTHON中的兩個CSV文件中查找公共區域

Question

我有兩個CSV文件，每個文件有10列，其中第一列稱為“主鍵”。

我需要使用Python查找兩個CSV文件之間的公共區域。 例如，我應該能夠檢測到CSV1中的第27-45行等於CSV2中的第125-145行，依此類推。

我只比較主鍵（第一列）。 其余數據不考慮進行比較。 我需要在兩個單獨的CSV文件中提取這些公共區域（一個用於CSV1，一個用於CSV2）。

我已經解析了兩個CSV文件的行並將其存儲在兩個“列表列表” lstCAN_LOG_TABLE和lstSHADOW_LOG_TABLE ，因此該問題可以簡化為比較這兩個列表列表。

我目前假設的是，如果以后有10個匹配項（ MAX_COMMON_THRESHOLD ），那么我已經到達一個公共區域的開始。 我不能記錄單行（與true相比），因為會有相等的區域（按主鍵）和需要標識的區域。

for index in range(len(lstCAN_LOG_TABLE)):
    for l_index in range(len(lstSHADOW_LOG_TABLE)):
        if(lstSHADOW_LOG_TABLE[l_index][1] == lstCAN_LOG_TABLE[index][1]):  #Consider for comparison only CAN IDs
            index_can_log = index                                           #Position where CAN Log is to be compared
            index_shadow_log = l_index                                      #Position from where CAN Shadow Log is to be considered
            start = index_shadow_log
            if((index_shadow_log + MAX_COMMON_THRESHOLD) <= (input_file_two_row_count-1)):
                end = index_shadow_log + MAX_COMMON_THRESHOLD
            else:
                end = (index_shadow_log) + ((input_file_two_row_count-1) - (index_shadow_log))
            can_index = index
            bPreScreened = 1
            for num in range(start,end):
                if(lstSHADOW_LOG_TABLE[num][1] == lstCAN_LOG_TABLE[can_index][1]):
                    if((can_index + 1) < (input_file_one_row_count-1)):
                        can_index = can_index + 1                           
                    else:
                        break   
                else:
                    bPreScreened = 0
                    print("No Match")
                    break
            #we might have found start of common region         
            if(bPreScreened == 1):      
                print("Start={0} End={1} can_index={2}".format(start,end,can_index))
                for number in range(start,end):
                    if(lstSHADOW_LOG_TABLE[number][1] == lstCAN_LOG_TABLE[index][1]):                           
                        writer_two.writerow(lstSHADOW_LOG_TABLE[number][0])
                        writer_one.writerow(lstCAN_LOG_TABLE[index][0])
                        if((index + 1) < (input_file_one_row_count-1)):
                            index = index + 1                           
                        else:
                            dump_file.close()   
                            print("\nCommon Region in Two CSVs identifed and recorded\n")                           
                            return
dump_file.close()   
print("\nCommon Region in Two CSVs identifed and recorded\n")

我得到奇怪的輸出。 即使第一個CSV文件也只有1880行，但是在第一個CSV的公共區域CSV中，我得到了更多的條目。 我沒有得到想要的輸出。

從這里編輯

CSV1：

216 0.000238225 F4  41  C0  FB  28  0   0   0   MS CAN
109 0.0002256   15  8B  31  0   8   43  58  0   HS CAN
216 0.000238025 FB  47  C6  1   28  0   0   0   MS CAN
340 0.000240175 0A  18  0   C2  0   0   6F  FF  MS CAN
216 0.000240225 24  70  EF  28  28  0   0   0   MS CAN
216 0.000236225 2B  77  F7  2F  28  0   0   0   MS CAN
216 0.0002278   31  7D  FD  35  28  0   0   0   MS CAN

CSV2：

216 0.0002361   0F  5C  DB  14  28  0   0   0   MS CAN
216 0.000236225 16  63  E2  1B  28  0   0   0   MS CAN
109 0.0001412   16  A3  31  0   8   63  58  0   HS CAN
216 0.000234075 1C  6A  E9  22  28  0   0   0   MS CAN
40A 0.000259925 C1  1   46  54  30  44  47  36  HS CAN
4A  0.000565975 2   0   0   0   0   0   0   C0  MS CAN
340 0.000240175 0A  18  0   C2  0   0   6F  FF  MS CAN
216 0.000240225 24  70  EF  28  28  0   0   0   MS CAN
216 0.000236225 2B  77  F7  2F  28  0   0   0   MS CAN
216 0.0002278   31  7D  FD  35  28  0   0   0   MS CAN

預期輸出CSV1：

340 0.000240175 0A  18  0   C2  0   0   6F  FF  MS CAN
216 0.000240225 24  70  EF  28  28  0   0   0   MS CAN
216 0.000236225 2B  77  F7  2F  28  0   0   0   MS CAN
216 0.0002278   31  7D  FD  35  28  0   0   0   MS CAN

預期輸出CSV2：

340 0.000240175 0A  18  0   C2  0   0   6F  FF  MS CAN
216 0.000240225 24  70  EF  28  28  0   0   0   MS CAN
216 0.000236225 2B  77  F7  2F  28  0   0   0   MS CAN
216 0.0002278   31  7D  FD  35  28  0   0   0   MS CAN

觀察到的輸出CSV1

340 0.000240175 0A  18  0   C2  0   0   6F  FF  MS CAN
216 0.000240225 24  70  EF  28  28  0   0   0   MS CAN
216 0.000236225 2B  77  F7  2F  28  0   0   0   MS CAN
216 0.0002278   31  7D  FD  35  28  0   0   0   MS CAN

以及數千個冗余行數據

已編輯-已按建議解決（更改為白色）：

學習： 在Python中，無法在運行時更改FOR循環索引

dump_file=open("MATCH_PATTERN.txt",'w+')
print("Number of Entries CAN LOG={0}".format(len(lstCAN_LOG_TABLE)))
print("Number of Entries SHADOW LOG={0}".format(len(lstSHADOW_LOG_TABLE)))  
index = 0   
while(index < (input_file_one_row_count - 1)):
    l_index = 0
    while(l_index < (input_file_two_row_count - 1)):
        if(lstSHADOW_LOG_TABLE[l_index][1] == lstCAN_LOG_TABLE[index][1]):  #Consider for comparison only CAN IDs
            index_can_log = index                                           #Position where CAN Log is to be compared
            index_shadow_log = l_index                                      #Position from where CAN Shadow Log is to be considered
            start = index_shadow_log
            can_index = index
            if((index_shadow_log + MAX_COMMON_THRESHOLD) <= (input_file_two_row_count-1)):
                end = index_shadow_log + MAX_COMMON_THRESHOLD
            else:
                end = (index_shadow_log) + ((input_file_two_row_count-1) - (index_shadow_log))              
            bPreScreened = 1
            for num in range(start,end):
                if(lstSHADOW_LOG_TABLE[num][1] == lstCAN_LOG_TABLE[can_index][1]):                      
                    if((can_index + 1) < (input_file_one_row_count-1)):
                        can_index = can_index + 1                           
                    else:
                        break   
                else:
                    bPreScreened = 0
                    break
            #we might have found start of common region         
            if(bPreScreened == 1):      
                print("Shadow Start={0} Shadow End={1} CAN INDEX={2}".format(start,end,index))
                for number in range(start,end):
                    if(lstSHADOW_LOG_TABLE[number][1] == lstCAN_LOG_TABLE[index][1]):                           
                        writer_two.writerow(lstSHADOW_LOG_TABLE[number][0])
                        writer_one.writerow(lstCAN_LOG_TABLE[index][0])
                        if((index + 1) < (input_file_one_row_count-1)):
                            index = index + 1
                        if((l_index + 1) < (input_file_two_row_count-1)):
                            l_index = l_index + 1                               
                        else:
                            dump_file.close()   
                            print("\nCommon Region in Two CSVs identifed and recorded\n")                           
                            return
            else:
                l_index = l_index + 1
        else:
            l_index = l_index + 1
    index = index + 1   
dump_file.close()   
print("\nCommon Region in Two CSVs identifed and recorded\n")

Answer 1

index是for循環中的迭代器。 如果您在循環內更改了它，它將在每次迭代后重新分配。

假設在for循環中index = 5且index += 1被執行3次。 現在index = 8 。 但是，在此迭代結束之后，當您的代碼返回到for ，會將index x = 6分配給index x = 6 。

請嘗試以下示例：

for index in range(0,5):
    print 'iterator:', index
    index = index + 2
    print 'index:', index

輸出將是：

iterator: 0
index: 2
iterator: 1
index: 3
iterator: 2
index: 4
iterator: 3
index: 5
iterator: 4
index: 6

要解決此問題，您可能需要將for循環更改for while循環

編輯：如果我不明白錯誤，您正在嘗試在兩個文件中查找“相同”列並將其存儲。 如果是這種情況，實際上可以使用以下代碼輕松完成您的工作：

import csv # import csv module to read csv files

file1 = 'csv1.csv' # input file 1
file2 = 'csv2.csv' # input file 2
outfile = 'csv3.csv' # only have one output file since two output files will be the same

read1 = csv.reader(open(file1, 'r')) # read input file 1
write = csv.writer(open(outfile, 'w')) # write to output file

# for each row in input file 1, compare it with each row in input file 2
# if they are the same, write that row into output file
for row1 in read1:
    read2 = csv.reader(open(file2, 'r'))
    for row2 in read2:
        if row1 == row2:
            write.writerow(row1)

read1.close()
write.close()

在PYTHON中的兩個CSV文件中查找公共區域

問題描述

1 個解決方案

解決方案1
0 已采納 2014-07-28 15:09:48

在PYTHON中的兩個CSV文件中查找公共區域

問題描述

1 個解決方案

解決方案1 0 已采納 2014-07-28 15:09:48

解決方案1
0 已采納 2014-07-28 15:09:48