简体   繁体   中英

Python - Find the average for each column in a csv file excluding headers and time

I am reading in a csv file like so:

with open('data.csv', 'rb') as f:
    reader = csv.reader(f)
    data_list = list(reader)

Here data_list is a list of each line in the csv file. So data_list[0] is the first line in the csv file (ie the Headers), data_list[1] onwards is the actual lines containing data in csv file and data_line[1:][1] is the time.

So basically

data_list=
[['','Header1','Header2','Header3'],
['12:02:11', '2.3', '6.2', '11.8'],
['12:05:25', '1.5', '7.5', '13.2'],
['12:10:48', '4.1', '6.8', '12.6'],
['12:13:17', '1.6', '7.1', '12.1']]

I want to find the average of each column but excluding the Headers and the time as part of the calculations but keeping the Headers for the output and only taking one decimal place. Overall I want to produce something like this:

average_data_list=
[['','Header1','Header2','Header3'],
['', 2.3', '6.9', '12.4']]

I have been using Python - Calculate average for every column in a csv file as a guide but my code keeps throwing errors as I can't get it to skip the headers and time correctly.

Any help would be much appreciated

The following should work:

import csv

with open('data.csv', 'rb') as f:
    reader = csv.reader(f)
    header = next(reader)
    data_list = list(reader)
    rows = [''] + ['{:.1f}'.format(sum(float(x) for x in y) / len(data_list)) for y in zip(*data_list)[1:]]
    average_data_list = [header] + [rows]

    print average_data_list

This would display:

[['', 'Header1', 'Header2', 'Header3'], ['', '2.4', '6.9', '12.4']]

The trick here is to read the header row first so that it does not get in the way. The zip(*data_list) is used to convert your list of rows to a list of columns so that the average can be easily calculated.

You can try:

for i, row in enumerate(data_list):
    if i == 0:
         continue

    for j, value in enumerate(row):
        average_data_list[j] += value

quantity = len(data_list) - 1
for i, sum in enumerate(average_data_list):
    average_data_list[i] = sum / quantity

First you first sum all the values into the array; Second you iterate over the resulting array computing its average.

Another option would be to create an array of the quantities and increment at each step, in the case you want to ignore nulls .

How about this. a,b and c aggregate the sums from each respective column. Then just divided by the number of rows minus one (ignoring the header row) and print only a single decimal.

a,b,c = 0,0,0
for i, row in enumerate(data_list):
    if i != 0:
        a += float(row[1])
        b += float(row[2])
        c += float(row[3])

num_vals = len(data_list) - 1 #because of the header
a /= num_vals
b /= num_vals
c /= num_vals
print "{0:.1f} , {1:.1f}, {2:.1f}".format(a,b,c)

The problem is fairly easy to solve using the csv and statistics modules provided in Python's standard library. The following example loads the data from the CSV file using the DictReader class while simultaneously pivoting the data using the column names. Averaging the data in the columns is accomplished with the mean function while data conversion is handled via map and float .

#! /usr/bin/env python3
import csv
import statistics


def main():
    with open('data.csv', newline='') as file:
        reader = csv.DictReader(file)
        column = {key: [] for key in reader.fieldnames}
        for row in reader:
            for key in reader.fieldnames:
                column[key].append(row[key])
    print('Header1 Average =', statistics.mean(map(float, column['Header1'])))
    print('Header2 Average =', statistics.mean(map(float, column['Header2'])))
    print('Header3 Average =', statistics.mean(map(float, column['Header3'])))


if __name__ == '__main__':
    main()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM