I am reading in a csv file like so:
with open('data.csv', 'rb') as f:
reader = csv.reader(f)
data_list = list(reader)
Here data_list
is a list of each line in the csv file. So data_list[0]
is the first line in the csv file (ie the Headers), data_list[1]
onwards is the actual lines containing data in csv file and data_line[1:][1]
is the time.
So basically
data_list=
[['','Header1','Header2','Header3'],
['12:02:11', '2.3', '6.2', '11.8'],
['12:05:25', '1.5', '7.5', '13.2'],
['12:10:48', '4.1', '6.8', '12.6'],
['12:13:17', '1.6', '7.1', '12.1']]
I want to find the average of each column but excluding the Headers and the time as part of the calculations but keeping the Headers for the output and only taking one decimal place. Overall I want to produce something like this:
average_data_list=
[['','Header1','Header2','Header3'],
['', 2.3', '6.9', '12.4']]
I have been using Python - Calculate average for every column in a csv file as a guide but my code keeps throwing errors as I can't get it to skip the headers and time correctly.
Any help would be much appreciated
The following should work:
import csv
with open('data.csv', 'rb') as f:
reader = csv.reader(f)
header = next(reader)
data_list = list(reader)
rows = [''] + ['{:.1f}'.format(sum(float(x) for x in y) / len(data_list)) for y in zip(*data_list)[1:]]
average_data_list = [header] + [rows]
print average_data_list
This would display:
[['', 'Header1', 'Header2', 'Header3'], ['', '2.4', '6.9', '12.4']]
The trick here is to read the header row first so that it does not get in the way. The zip(*data_list)
is used to convert your list of rows to a list of columns so that the average can be easily calculated.
You can try:
for i, row in enumerate(data_list):
if i == 0:
continue
for j, value in enumerate(row):
average_data_list[j] += value
quantity = len(data_list) - 1
for i, sum in enumerate(average_data_list):
average_data_list[i] = sum / quantity
First you first sum all the values into the array; Second you iterate over the resulting array computing its average.
Another option would be to create an array of the quantities and increment at each step, in the case you want to ignore nulls .
How about this. a,b and c aggregate the sums from each respective column. Then just divided by the number of rows minus one (ignoring the header row) and print only a single decimal.
a,b,c = 0,0,0
for i, row in enumerate(data_list):
if i != 0:
a += float(row[1])
b += float(row[2])
c += float(row[3])
num_vals = len(data_list) - 1 #because of the header
a /= num_vals
b /= num_vals
c /= num_vals
print "{0:.1f} , {1:.1f}, {2:.1f}".format(a,b,c)
The problem is fairly easy to solve using the csv
and statistics
modules provided in Python's standard library. The following example loads the data from the CSV file using the DictReader
class while simultaneously pivoting the data using the column names. Averaging the data in the columns is accomplished with the mean
function while data conversion is handled via map
and float
.
#! /usr/bin/env python3
import csv
import statistics
def main():
with open('data.csv', newline='') as file:
reader = csv.DictReader(file)
column = {key: [] for key in reader.fieldnames}
for row in reader:
for key in reader.fieldnames:
column[key].append(row[key])
print('Header1 Average =', statistics.mean(map(float, column['Header1'])))
print('Header2 Average =', statistics.mean(map(float, column['Header2'])))
print('Header3 Average =', statistics.mean(map(float, column['Header3'])))
if __name__ == '__main__':
main()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.