How to print the average, min and max for a specific year from a CSV file?

Question

I need help with this exercise. I have a csv file like this one in one column but with 16000 entries:

Entity,Code,Year,Life expectancy (years)
Afghanistan,AFG,1950,27.638
Afghanistan,AFG,1951,27.878
Serbia,SRB,1995,71.841
Zimbabwe,ZWE,2019,61.49

I need to print this, I got the first 2 parts working.

✅What is the year and country that has the lowest life expectancy in the dataset?
✅What is the year and country that has the highest life expectancy in the dataset?
❌Allow the user to type in a year, then, find the average life expectancy for that year. Then find the country with the minimum and the one with the maximum life expectancy for that year.

So far I´m here and I need some help on how to get the last part to print something like this related to the year input by the user:

For the year 1959:

The average life expectancy across all countries was 54.95

The max life expectancy was in Norway with 73.49

The min life expectancy was in Mali with 28.077

import csv
print ("Enter a year to find the average life expectancy for that year: ")
input_year = input ("\n""YEAR: ")


#Allow the user to type in a year, then, find the average life expectancy for that year
def subset_year(all_data, selected_year):
    year_only = []
    for entity, code, year, expectancy in all_data:
        if year == selected_year:
            year_only.append((entity, code, year, expectancy))
    return year_only

def pr_row(headers, row):
    return ", ".join(f"{label}:{value}" for label, value in zip(headers, row))

data = []
with open(r"C:\Users\X230\Desktop\Python class\life-expectancy.csv") as database:
    reader = csv.reader(database)
    # the first row in the CSV file is; Entity,Code,Year,Life expectancy (years)
    # example of the data in the CVS file: Afghanistan,AFG,1950,27.638
    parts = next(reader)
    for line in reader:
        # print(line) #this prints everything, not very useful so I removed it
        # Save the parts I need into variables
        entity, code, year, expectancy = line
        data.append([entity, code, int(year), float(expectancy)])

def key_year(row):
    return row[3]
print()
print("The overall max life expectancy is: ", pr_row(parts, max(data, key=key_year)))
print("The overall min life expectancy is: ", pr_row(parts, min(data, key=key_year)))
print("The average life expectancy across all countries was: ", ) #??????????????
print("The max life expectancy was in: ", ) #????????????????????????????????????
print("The min life expectancy was in: ", ) #????????????????????????????????????
year = input_year
all_by_year = subset_year(data, year)
print(all_by_year)

Answer 1

The first thing I would do is iterate over the lines of the file and separate the rows that are useful. Then, I would use the map() function to get the life expectancy of each row (without having to use a for loop to iterate over each one of them), and by putting that into min() and max() functions you can easily get the minimum and maximum value.

For the average, I just used sum() to get the sum of all selected values and divided that by the length of a list containing all those values (basically, the number of values). You can use mean() to get the average too, but you would need to import the statistics module first.

Finally, it iterates over the selected rows until it finds the row which contains the minimum/maximum values, just to print it along with the country of that row.

import csv

year_input = input('Enter the year: ')

with open('data.csv','r') as file:
    reader = csv.reader(file)
    lines = []
    
    # We iterate over the lines in the file, excluding the header of course.
    # If the year matches the user input, then we append that row to a list.
    
    for line in reader:
        if line == ['Entity', 'Code', 'Year', 'Life expectancy (years)']:
            pass
        else:
            if line[2] == year_input:
                lines.append(line)
                
    # Once we have that list, we get the average, the minimum and the maximum value for the life expectancy.

    average = sum(map(lambda x: float(x[3]), lines))/len(list(map(lambda x: x[3], lines)))
    minimum = min(map(lambda x: x[3], lines))
    maximum = max(map(lambda x: x[3], lines))
    
    print('For the year ' + year_input + ':')
    print('The average life expectancy across all countries was ' + str(average))
    
    # Now, we iterate over the rows we selected before until we find the minimum and maximum value.
    # When we find the row with the minimum/maximum value, we print it next to the country on that same row.
    
    for line in lines:
        if line[3] == minimum:
            print('The min life expectancy was in ' + line[0] + ' with ' + line[3])
        if line[3] == maximum:
            print('The max life expectancy was in ' + line[0] + ' with ' + line[3])

How to print the average, min and max for a specific year from a CSV file?

Question

1 answers

solution1
0 2022-07-08 04:07:10

How to print the average, min and max for a specific year from a CSV file?

Question

1 answers

solution1 0 2022-07-08 04:07:10

solution1
0 2022-07-08 04:07:10