简体   繁体   English

以相反的顺序读取csv文件时,请读取上一行(Python)

[英]Read the previous row while reading a csv file in reverse order (Python)

I need to read a csv file from bottom up and write data to a text file. 我需要从下往上读取csv文件并将数据写入文本文件。 The file has info for different combinations of customers, products, and locations; 该文件包含有关客户,产品和位置的不同组合的信息; however, it doesn't have all required information - the missing rows when Quantity is 0 . 但是,它并不具有所有必需的信息- 数量为0时缺少行。 The file can be huge, that is why I need not rewrite it or use additional lists since at some point I split it. 该文件可能很大,这就是为什么我不需要重写或使用其他列表的原因,因为有时我会拆分它。

What I want to do is while reading the file backwards, compare the required Period_ids from my list with all ids for each combination from the csv file, and if the id is missing, I want to read the previous row again (and again) until the id from the file is equal to the required id from the list (ps. I know I cannot do it with a for loop, but then I am not sure how to still read the file in reverse order and do what I need to do). 我想做的是在向后读取文件的同时,将列表中所需的Period_id与csv文件中每个组合的所有ID进行比较,如果ID缺失,我想再次读取前一行(并再次读取),直到文件中的ID等于列表中所需的ID(ps。我知道我无法使用for循环来完成此操作,但是我不确定如何仍然以相反的顺序读取文件并执行我需要做的事情)。 Please see the attached image with the given data and the required results (in green is the start for each combination). 请查看随附的图像以及给定的数据和所需的结果(绿色是每种组合的开头)。 The method below (I made it shorter for this example) is not exactly correct because I get all rows from the csv file but without the missing rows. 下面的方法(在此示例中,我将其简化了)并不完全正确,因为我从csv文件中获取了所有行,但没有丢失的行。 Any help with this logic is appreciated (I would also prefer to modify this existing method somehow without using libraries like pandas :) Thank you! 感谢您提供有关此逻辑的任何帮助(我也希望不使用pandas之类的库而以某种方式修改此现有方法:)谢谢!

def read_file_in_reverse(): # ... some code def read_file_in_reverse():#...一些代码

# Required ids.
all_required_ids = [412, 411, 410, 409, 408, 407, 406, 405]

# Needed to count period ids.
count_index_for_periodid = 0

# Read csv file.
with open(('.\myFile.csv'), 'rb') as f:       
    time_csv = csv.reader(f)

    # Read the file in reversed order.
    for line in reversed(list(time_csv)):
        # ... some code

            ###### Get quantities from the file.
            for col_num in range(5, 7):
                # ... code to get items

                ### quantity
                # If next id is not equal to the next required id.
                if str(next_id) != str(all_required_ids[count_index_for_periodid]):
                    list_qty.append(0) 
                else:
                    qty = line[col_num]
                    list_quantity.append(qty)

        # Should add another condition here      
        count_index_for_periodid += 1 

在此处输入图片说明

If the file is large, then it would be best to avoid having to read the whole file into memory at once, which would be required if you need to read the file backwards. 如果文件很大,那么最好避免一次将整个文件读入内存,如果需要向后读文件,则需要这样做。 Instead, rethink the problem to parse the file forwards. 相反,请重新考虑问题以解析文件。 In effect you are trying to write blocks of rows which contain all the necessary Period_id . 实际上,您正在尝试编写包含所有必需的Period_id的行块。 So keep reading rows until you find a row which has an ID <= to the previous row. 因此,请继续读取行,直到找到ID <=上一行的行。 At this point you have a block which needs to be expanded to contain any missing rows and then written to a file. 此时,您需要扩展一个块以包含任何缺少的行,然后将其写入文件。 For example: 例如:

import csv

def write_block(block):
    if len(block):
        fill = block[0][1:4]
        block_dict = {int(row[4]) : row for row in block}

        for row in range(405, 413):
            try:
                csv_output.writerow(block_dict[row])
            except KeyError as e:
                csv_output.writerow([999] + fill + [row, 0, 0, 0])

with open('myFile.csv', 'rb') as f_input, open('output.csv', 'wb') as f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)
    header = next(csv_input)
    csv_output.writerow(header)
    block = [next(csv_input)]

    for row in csv_input:
        # Is the period ID <= to the last one that was read?
        if int(row[4]) <= int(block[-1][4]):
            write_block(block)
            # Start a new block
            block = [row]
        else:
            block.append(row)

    # Write any remaining entries when the end of file is reached
    write_block(block)

write_block() works by taking all the found entries for a block and converting them into a dictionary based on the ID. write_block()工作方式是:找到一个块的所有找到的条目,然后根据ID将它们转换为字典。 It then attempts to look up each required ID in the dictionary, if it is present, it gets written as-is to the output file. 然后,它尝试在字典中查找每个所需的ID(如果存在),则将其原样写入输出文件。 If it is missing, a suitable row is created using other values. 如果丢失,则会使用其他值创建合适的行。


If you really want to work on it backwards, then simply read the whole file in (using list(csv_input)), and iterate over the entries backwards using [::-1] . 如果您真的想向后处理,则只需读取整个文件(使用list(csv_input)),然后使用[::-1]向后遍历条目。 The logic then need to be changed to look for IDs >= to the previous line read. 然后,需要更改逻辑以查找与上一行读取的ID >= ID。 eg 例如

import csv

def write_block(block):
    if len(block):
        fill = block[0][1:4]
        block_dict = {int(row[4]) : row for row in block}

        for row in range(405, 413):
            try:
                csv_output.writerow(block_dict[row])
            except KeyError as e:
                csv_output.writerow([999] + fill + [row, 0, 0, 0])

with open('myFile.csv', 'rb') as f_input, open('output.csv', 'wb') as f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)
    header = next(csv_input)
    csv_output.writerow(header)
    block = [next(csv_input)]

    for row in list(csv_input)[::-1]:
        if int(row[4]) >= int(block[-1][4]):
            write_block(block)
            block = [row]
        else:
            block.append(row)

    write_block(block)

If you add print row after the for statement, you would be able to see that it is working backwards. 如果在for语句之后添加print row ,则可以看到它在向后工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM