简体   繁体   中英

Read the previous row while reading a csv file in reverse order (Python)

I need to read a csv file from bottom up and write data to a text file. The file has info for different combinations of customers, products, and locations; however, it doesn't have all required information - the missing rows when Quantity is 0 . The file can be huge, that is why I need not rewrite it or use additional lists since at some point I split it.

What I want to do is while reading the file backwards, compare the required Period_ids from my list with all ids for each combination from the csv file, and if the id is missing, I want to read the previous row again (and again) until the id from the file is equal to the required id from the list (ps. I know I cannot do it with a for loop, but then I am not sure how to still read the file in reverse order and do what I need to do). Please see the attached image with the given data and the required results (in green is the start for each combination). The method below (I made it shorter for this example) is not exactly correct because I get all rows from the csv file but without the missing rows. Any help with this logic is appreciated (I would also prefer to modify this existing method somehow without using libraries like pandas :) Thank you!

def read_file_in_reverse(): # ... some code

# Required ids.
all_required_ids = [412, 411, 410, 409, 408, 407, 406, 405]

# Needed to count period ids.
count_index_for_periodid = 0

# Read csv file.
with open(('.\myFile.csv'), 'rb') as f:       
    time_csv = csv.reader(f)

    # Read the file in reversed order.
    for line in reversed(list(time_csv)):
        # ... some code

            ###### Get quantities from the file.
            for col_num in range(5, 7):
                # ... code to get items

                ### quantity
                # If next id is not equal to the next required id.
                if str(next_id) != str(all_required_ids[count_index_for_periodid]):
                    list_qty.append(0) 
                else:
                    qty = line[col_num]
                    list_quantity.append(qty)

        # Should add another condition here      
        count_index_for_periodid += 1 

在此处输入图片说明

If the file is large, then it would be best to avoid having to read the whole file into memory at once, which would be required if you need to read the file backwards. Instead, rethink the problem to parse the file forwards. In effect you are trying to write blocks of rows which contain all the necessary Period_id . So keep reading rows until you find a row which has an ID <= to the previous row. At this point you have a block which needs to be expanded to contain any missing rows and then written to a file. For example:

import csv

def write_block(block):
    if len(block):
        fill = block[0][1:4]
        block_dict = {int(row[4]) : row for row in block}

        for row in range(405, 413):
            try:
                csv_output.writerow(block_dict[row])
            except KeyError as e:
                csv_output.writerow([999] + fill + [row, 0, 0, 0])

with open('myFile.csv', 'rb') as f_input, open('output.csv', 'wb') as f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)
    header = next(csv_input)
    csv_output.writerow(header)
    block = [next(csv_input)]

    for row in csv_input:
        # Is the period ID <= to the last one that was read?
        if int(row[4]) <= int(block[-1][4]):
            write_block(block)
            # Start a new block
            block = [row]
        else:
            block.append(row)

    # Write any remaining entries when the end of file is reached
    write_block(block)

write_block() works by taking all the found entries for a block and converting them into a dictionary based on the ID. It then attempts to look up each required ID in the dictionary, if it is present, it gets written as-is to the output file. If it is missing, a suitable row is created using other values.


If you really want to work on it backwards, then simply read the whole file in (using list(csv_input)), and iterate over the entries backwards using [::-1] . The logic then need to be changed to look for IDs >= to the previous line read. eg

import csv

def write_block(block):
    if len(block):
        fill = block[0][1:4]
        block_dict = {int(row[4]) : row for row in block}

        for row in range(405, 413):
            try:
                csv_output.writerow(block_dict[row])
            except KeyError as e:
                csv_output.writerow([999] + fill + [row, 0, 0, 0])

with open('myFile.csv', 'rb') as f_input, open('output.csv', 'wb') as f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)
    header = next(csv_input)
    csv_output.writerow(header)
    block = [next(csv_input)]

    for row in list(csv_input)[::-1]:
        if int(row[4]) >= int(block[-1][4]):
            write_block(block)
            block = [row]
        else:
            block.append(row)

    write_block(block)

If you add print row after the for statement, you would be able to see that it is working backwards.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM