I'm coming from a Javascript background, and I know this works in Javascript, but what's fundamentally different here with Python?
I'm reading a CSV (sample below) and adding up all the values of a column (based on index parameter) into a list within the get_min_max
function, sorting said list, and returning the first and last value in the list, for min and max, respectively.
The first call of get_min_max
works great, but the second call fails. What happens is that the values from the second function call get appended to the first list.
How do I prevent the second function call from appending to the same list as the first function call? Clearly, I'm missing something fundamental about Python here.
0,11,23
1,34,67
2,86,99
3,45,21
4,60,98
5,2,123
6,7,12
7,9,0
import csv
f = open("test.csv", "r")
reader = csv.reader(f, delimiter=",")
def get_min_max(reader, index):
arr=[]
for row in reader:
arr.append(row[index])
arr.sort()
return {
"min": arr[0],
"max": arr[-1]
}
get_min_max(reader, 1) # call no. 1
get_min_max(reader, 2) # call no. 2
List index out of range on call no. 2. Returning the list on the second call returns empty list; returning the list on the first call returns list of values from the first call and the second call.
Thanks.
In the second call, data from reader
has been consumed and hence returns nothing.
This illustrates the problem:
>>> f = open("test.csv", "r")
>>> import csv
>>> reader = csv.reader(f, delimiter=",")
>>> list(reader)
[['0', '11', '23'], ['1', '34', '67'], ['2', '86', '99'], ['3', '45', '21'], ['4', '60', '98'], ['5', '2', '123'], ['6', '7', '12'], ['7', '9', '0']]
>>> list(reader)
[]
Possible solutions: You can either cache the file data in some variables or reopen and read from the file within the function get_min_max
There are two errors: one that Anthony mentioned (reader already consumed the file) and another one - you're sorting the numbers as "strings" which means that "11" < "2".
To fix it:
import csv
def get_min_max(filename, index):
f = open(filename, "r")
reader = csv.reader(f, delimiter=",")
arr=[]
for row in reader:
arr.append(int(row[index])) # <-- second fix
arr.sort()
return {
"min": arr[0],
"max": arr[-1]
}
print get_min_max("test.csv", 1) # prints {'max': 86, 'min': 2}
print get_min_max("test.csv", 2) # prints {'max': 123, 'min': 0}
Its because you already read through the file. File objects are only itterable once. you have to seek back to the beginning of the file using file.seek(0)
or cache the data. Also you should convert those strings to ints because it will cause weird things like 11<9.
The above answers explain the cause of the program's failure.
If the file size is small(less than 10M), i suggest you first read file content into memeory then do whatever you what.
import csv
with open("test.csv", "r") as f:
rows = [row for row in csv.reader(f, delimiter=",")]
def get_min_max(rows, index):
arr=[]
for row in rows:
arr.append(row[index])
arr.sort()
return {
"min": arr[0],
"max": arr[-1]
}
print get_min_max(rows, 1) # call no. 1
print get_min_max(rows, 2) # call no. 2
or use generator to decouple the file reader like this:
import csv
def csv_gen(fileName):
with open(fileName, "r") as f:
for row in csv.reader(f, delimiter=","):
yield row
def get_min_max(rows, index):
arr=[]
for row in rows:
arr.append(row[index])
arr.sort()
return {
"min": arr[0],
"max": arr[-1]
}
print get_min_max(csv_gen("test.csv"), 1) # call no. 1
print get_min_max(csv_gen("test.csv"), 2) # call no. 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.