变量不会在嵌套循环python中更改

Question

Python beginner here. Python初学者在这里。 I need to get each line from first file "src.csv", that has strings like (300 rows) 我需要从第一个文件“ src.csv”中获取每一行，该文件的字符串类似于（300行）

"12345, a, b"
"234567, e, c"

and find string in second file "data.csv" (100k rows) 并在第二个文件“ data.csv”中查找字符串（100k行）

"12345678"
"23456789011248"

by first coloumn of the first file, where digits are substring of one of the string in the secon file. 按第一个文件的第一列，其中数字是secon文件中一个字符串的子字符串。 And then write to output file. 然后写入输出文件。

import sys
import csv



dat_file_name = "data.dat"
src_file_name = "src.csv"
out_file_name = "out.csv"

if (len(sys.argv) == 4):
    dat_file_name = sys.argv[1]
    src_file_name = sys.argv[2]
    out_file_name = sys.argv[3]


out_writer = open(out_file_name, "w")


i = 0
j = 0
with open(src_file_name, "r") as src, open(dat_file_name, 'r') as dat:
    src_reader = csv.reader(src)
    dat_reader = csv.reader(dat)

    for sub_string in src_reader:

        # print sub_string

        for string in dat_reader:

            out_writer.write(sub_string[0])
            out_writer.write("\n")

            print sub_string[0]

            i+=1
        j+=i


out_writer.close()

print i #for debug only
print j #for debug only

But instead of expected value of "sub_string[0]", I have first value of first row of first file... 但是，我没有第一个文件的第一行的第一个值，而不是“ sub_string [0]”的期望值。

12345
12345
...

in each iteration. 在每次迭代中。 And more then that, output file contains 100k rows instead of 30m. 而且，输出文件包含100k行而不是30m行。

My question is why my version of usage of nested loops has unexpected behavior. 我的问题是为什么我的嵌套循环用法版本具有意外行为。 Why variable "substring[0]" does not change inside of the nested loop? 为什么变量“ substring [0]”在嵌套循环内不会更改？ I would appreciate any help. 我将不胜感激任何帮助。

Answer 1

Why should it change in the nested loop? 为什么要在嵌套循环中更改它？ The inner loop is iterating over dat_reader , but sub_string is the result of the outer iteration, which can't change until the inner loop has completely finished. 内部循环在dat_reader上进行迭代，但是sub_string是外部迭代的结果，除非内部循环完全完成，否则外部迭代无法更改。

You don't want a nested loop at all; 您根本不需要嵌套循环； you want to loop over both files at once. 您想一次遍历两个文件。 You can do that with zip : 您可以使用zip来做到这一点：

for sub_string, string in zip(src_reader, dat_reader):
    out_writer.write(sub_string[0])

And you don't need the indexes i and j at all, remove them. 而且您根本不需要索引i和j ，将它们删除。

Answer 2

Alright there's a couple things wrong with this code. 好了，这段代码有几处错误。 First you don't even check for substrings and second your loops are backwards. 首先，您甚至不检查子字符串，其次，循环是向后的。

out_file_name = "out.csv"

if (len(sys.argv) == 4):
    dat_file_name = sys.argv[1]
    src_file_name = sys.argv[2]
    out_file_name = sys.argv[3]

with open(src_file_name, "r") as src, open(dat_file_name, 'r') as dat, open(out_file_name, "w") as out_writer:
    src_reader = csv.reader(src)
    dat_reader = csv.reader(dat)

    for string in dat_reader:    
        for sub_string in src_reader:
            if sub_string[0] in string: #Check if substring in string
                out_writer.write(sub_string[0])
                out_writer.write("\n")

                print sub_string[0]
        src.seek(0) #Your file pointer is at the end of the file so move it back to the beginning

变量不会在嵌套循环python中更改

问题描述

2 个解决方案

解决方案1
0 2015-11-19 14:43:51

解决方案2
0 已采纳 2015-11-19 14:59:18

变量不会在嵌套循环python中更改

问题描述

2 个解决方案

解决方案1 0 2015-11-19 14:43:51

解决方案2 0 已采纳 2015-11-19 14:59:18

解决方案1
0 2015-11-19 14:43:51

解决方案2
0 已采纳 2015-11-19 14:59:18