简体   繁体   English

在python中使用for循环处理时写出每一行,但只写第一行

[英]write each line as it's processed using for-loop in python, but only writing the first line

I have some code that I am trying to optimize to be more efficient. 我有一些代码正在尝试进行优化以提高效率。 One part of that is to process my file and after each line is processed, immediately write it to csv . 其中一部分是处理我的文件,在处理csv每一行后,立即将其写入csv This is ideal as then I am not wasting memory by processing the data, then loading the data into a list to write out the entire list. 这是理想的选择,因为这样我就不会通过处理数据来浪费内存,而是将数据加载到列表中以写出整个列表。 If I add the entire processed data to a list, I can write it to csv without trouble, as shown below, under # write folded_data to csv : 如果将全部处理过的数据添加到列表中,则可以毫无问题地将其写入csv ,如下所示,在# write folded_data to csv

Note: the code under #data processing is solid, and I just need help writing out each row as it's processed. 注意:#data处理下的代码是可靠的,我只需要帮助将每一行写出来就可以了。

# data processing
seen = set()
folded_data = []
for u in name_nodes:
#    seen=set([u]) # print both u-v, and v-u
    seen.add(u) # don't print v-u
    unbrs = set(B[u])
    nbrs2 = set((n for nbr in unbrs for n in B[nbr])) - seen
    for v in nbrs2:
        vnbrs = set(B[v])
        common = unbrs & vnbrs
        weight = len(common)
        row = u, v, weight
        folded_data.append(row)

# write folded_data to csv
with ('out_file.csv', 'wb') as f:
    writer = csv.writer(f)
    writer.writerows(folded_data)

However, when I try and write out each row as it's processed, I only get the first line in 'out_file.csv'. 但是,当我尝试在处理完每一row将其写出时,我只会在“ out_file.csv”中获得第一行。

# data processing
seen = set()
for u in name_nodes:
    # seen=set([u]) # print both u-v, and v-u
    seen.add(u) # don't print v-u
    unbrs = set(B[u])   
    nbrs2 = set((n for nbr in unbrs for n in B[nbr])) - seen
    for v in nbrs2:
        vnbrs = set(B[v])
        common = unbrs & vnbrs
        weight = len(common)
        row = u, v, weight
        # write row for each line to csv
        with open('out_file.csv', 'wb') as f:
            writer = csv.writer(f)
            writer.writerow(row)

I've tried moving my writing code around to make this work as I would like it to, but I haven't been able to figure this out. 我尝试过移动自己的编写代码以使其按我的意愿进行工作,但我一直无法弄清楚。

I doubt that you're getting the first line, you're getting the last line. 我怀疑您是第一行还是最后一行。 For each line you write out, you're reopening the file, erasing the previous contents. 对于您写出的每一行,您都将重新打开文件,从而擦除之前的内容。 Put the file open and csv writer creation outside of the loop. 将文件打开并在循环之外创建csv编写器。

I wouldn't worry about "wasting" memory unless your program asks for (eg) greater than 1/2 of the system memory. 除非您的程序要求(例如)大于系统内存的1/2,否则我不会担心“浪费”内存。 If your CSV is in the multi-gigabyte range (or bigger) then this is a concern. 如果您的CSV处于数GB(或更大)的范围内,那么这是一个问题。

If your csv is not this large, your file will end up in the OS file cache in memory, unless you have some non-standard kernel settings. 如果您的csv不够大,除非您有一些非标准的内核设置,否则文件将最终存储在内存中的OS文件缓存中。

To do it the "efficient" way (ie to not explicitly store your data in memory), you need to open the file before the for loop. 要以“高效”的方式(即不将数据显式存储在内存中)进行操作,需要在for循环之前打开文件。

Figured it out with the help of @etep and @MarkRansom! 在@etep和@MarkRansom的帮助下解决了! I have to open the file and define writer before the entire for-loop . 我必须打开文件并在整个for-loop之前定义writer

# open file and define writer
with open('out_file.csv', 'wb') as f:
    writer = csv.writer(f)

    # data processing
    seen = set()
    for u in name_nodes:
    #    seen=set([u]) # print both u-v, and v-u
        seen.add(u) # don't print v-u
        unbrs = set(B[u])
        nbrs2 = set((n for nbr in unbrs for n in B[nbr])) - seen
        for v in nbrs2:
            vnbrs = set(B[v])
            common = unbrs & vnbrs
            weight = len(common)
            row = u, v, weight
            # write row for each record
            writer.writerow(row)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM