简体   繁体   中英

variable does not change inside of a nested loop python

Python beginner here. I need to get each line from first file "src.csv", that has strings like (300 rows)

"12345, a, b"
"234567, e, c"

and find string in second file "data.csv" (100k rows)

"12345678"
"23456789011248"

by first coloumn of the first file, where digits are substring of one of the string in the secon file. And then write to output file.

import sys
import csv



dat_file_name = "data.dat"
src_file_name = "src.csv"
out_file_name = "out.csv"

if (len(sys.argv) == 4):
    dat_file_name = sys.argv[1]
    src_file_name = sys.argv[2]
    out_file_name = sys.argv[3]


out_writer = open(out_file_name, "w")


i = 0
j = 0
with open(src_file_name, "r") as src, open(dat_file_name, 'r') as dat:
    src_reader = csv.reader(src)
    dat_reader = csv.reader(dat)

    for sub_string in src_reader:

        # print sub_string

        for string in dat_reader:

            out_writer.write(sub_string[0])
            out_writer.write("\n")

            print sub_string[0]

            i+=1
        j+=i


out_writer.close()

print i #for debug only
print j #for debug only

But instead of expected value of "sub_string[0]", I have first value of first row of first file...

12345
12345
...

in each iteration. And more then that, output file contains 100k rows instead of 30m.

My question is why my version of usage of nested loops has unexpected behavior. Why variable "substring[0]" does not change inside of the nested loop? I would appreciate any help.

Why should it change in the nested loop? The inner loop is iterating over dat_reader , but sub_string is the result of the outer iteration, which can't change until the inner loop has completely finished.

You don't want a nested loop at all; you want to loop over both files at once. You can do that with zip :

for sub_string, string in zip(src_reader, dat_reader):
    out_writer.write(sub_string[0])

And you don't need the indexes i and j at all, remove them.

Alright there's a couple things wrong with this code. First you don't even check for substrings and second your loops are backwards.

out_file_name = "out.csv"

if (len(sys.argv) == 4):
    dat_file_name = sys.argv[1]
    src_file_name = sys.argv[2]
    out_file_name = sys.argv[3]

with open(src_file_name, "r") as src, open(dat_file_name, 'r') as dat, open(out_file_name, "w") as out_writer:
    src_reader = csv.reader(src)
    dat_reader = csv.reader(dat)

    for string in dat_reader:    
        for sub_string in src_reader:
            if sub_string[0] in string: #Check if substring in string
                out_writer.write(sub_string[0])
                out_writer.write("\n")

                print sub_string[0]
        src.seek(0) #Your file pointer is at the end of the file so move it back to the beginning

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM