简体   繁体   中英

Python: multiple function call appends to same list

I'm coming from a Javascript background, and I know this works in Javascript, but what's fundamentally different here with Python?

I'm reading a CSV (sample below) and adding up all the values of a column (based on index parameter) into a list within the get_min_max function, sorting said list, and returning the first and last value in the list, for min and max, respectively.

The first call of get_min_max works great, but the second call fails. What happens is that the values from the second function call get appended to the first list.

How do I prevent the second function call from appending to the same list as the first function call? Clearly, I'm missing something fundamental about Python here.

Sample CSV

0,11,23
1,34,67
2,86,99
3,45,21
4,60,98
5,2,123
6,7,12
7,9,0

Sample Code

import csv

f = open("test.csv", "r")

reader = csv.reader(f, delimiter=",")

def get_min_max(reader, index):
    arr=[]
    for row in reader:
        arr.append(row[index])
    arr.sort()
    return {
        "min": arr[0],
        "max": arr[-1]
    }

get_min_max(reader, 1) # call no. 1
get_min_max(reader, 2) # call no. 2

ERROR

List index out of range on call no. 2. Returning the list on the second call returns empty list; returning the list on the first call returns list of values from the first call and the second call.

Thanks.

In the second call, data from reader has been consumed and hence returns nothing.

This illustrates the problem:

>>> f = open("test.csv", "r")
>>> import csv
>>> reader = csv.reader(f, delimiter=",")
>>> list(reader)
[['0', '11', '23'], ['1', '34', '67'], ['2', '86', '99'], ['3', '45', '21'], ['4', '60', '98'], ['5', '2', '123'], ['6', '7', '12'], ['7', '9', '0']]
>>> list(reader)
[]

Possible solutions: You can either cache the file data in some variables or reopen and read from the file within the function get_min_max

There are two errors: one that Anthony mentioned (reader already consumed the file) and another one - you're sorting the numbers as "strings" which means that "11" < "2".

To fix it:

import csv

def get_min_max(filename, index):
    f = open(filename, "r")
    reader = csv.reader(f, delimiter=",")
    arr=[]
    for row in reader:
        arr.append(int(row[index])) # <-- second fix 
    arr.sort()
    return {
        "min": arr[0],
        "max": arr[-1]
    }

print get_min_max("test.csv", 1) # prints {'max': 86, 'min': 2}
print get_min_max("test.csv", 2) # prints {'max': 123, 'min': 0}

Its because you already read through the file. File objects are only itterable once. you have to seek back to the beginning of the file using file.seek(0) or cache the data. Also you should convert those strings to ints because it will cause weird things like 11<9.

The above answers explain the cause of the program's failure.
If the file size is small(less than 10M), i suggest you first read file content into memeory then do whatever you what.

import csv

with open("test.csv", "r") as f:
    rows = [row for row in csv.reader(f, delimiter=",")]

def get_min_max(rows, index):
    arr=[]
    for row in rows:
        arr.append(row[index])
    arr.sort()
    return {
        "min": arr[0],
        "max": arr[-1]
    }

print get_min_max(rows, 1) # call no. 1
print get_min_max(rows, 2) # call no. 2

or use generator to decouple the file reader like this:

import csv

def csv_gen(fileName):
    with open(fileName, "r") as f:
        for row in csv.reader(f, delimiter=","):
            yield row

def get_min_max(rows, index):
    arr=[]
    for row in rows:
        arr.append(row[index])
    arr.sort()
    return {
        "min": arr[0],
        "max": arr[-1]
    }

print get_min_max(csv_gen("test.csv"), 1) # call no. 1
print get_min_max(csv_gen("test.csv"), 2) # call no. 2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM