How to use statistics on a specific portion of a list of lists

Question

I have a text file which has the following abbreviated list of 365 entries, each on a single line. The first entry represents a date, and the second a value for the Dow Jones

8/28/2018|26064.01953
8/29/2018|26124.57031
8/30/2018|25986.91992

I am using the following code:

import os
import math
import statistics
def main ():
    infile = open('DJI.txt', 'r')
    values = infile.read()
    infile.close()
    values=values.split("\n")
    values=[value.split("|")for value in values]
    avg = sum([float(l[1]) for l in values])/len(values)
    highest = max([float(l[1]) for l in values])
    lowest = min([float(l[1]) for l in values])
    values.sort(key = lambda x:x[1])
    print(avg)
    print(highest)
    print(lowest)
    print(values)
main()

I am struggling with 2 more tasks on this code, first is to find the Average close value per month, rather than the average value for the whole year.

The second is that for the highest and lowest function, the date which the value occurred should also be displayed with the value.

Your help is greatly appreciated.

Answer 1

Using pandas , this functionality can be achieved fairly easily:

My input file: (note extra month data to check monthly averages)

8/28/2018|26064.01953
8/29/2018|26124.57031
8/30/2018|25986.91992
9/28/2018|26064.01953
9/29/2018|25124.57031
9/30/2018|25986.91992

Reading the input file:

>>> import pandas as pd
>>> df = pd.read_csv("input.txt", '|', header=None, names=["Date", "Dow-Jones Value"], parse_dates=["Date"])
>>> df
        Date  Dow-Jones Value
0 2018-08-28      26064.01953
1 2018-08-29      26124.57031
2 2018-08-30      25986.91992
3 2018-09-28      26064.01953
4 2018-09-29      25124.57031
5 2018-09-30      25986.91992

Retrieving statistics:

>>> df['Dow-Jones Value'].mean() # average
25891.836586666668

>>> df.iloc[df['Dow-Jones Value'].idxmax()] # highest
Date               2018-08-29 00:00:00
Dow-Jones Value                26124.6
Name: 1, dtype: object

>>> df.iloc[df['Dow-Jones Value'].idxmin()] # lowest
Date               2018-09-29 00:00:00
Dow-Jones Value                25124.6
Name: 4, dtype: object

>>> df.sort_values('Dow-Jones Value') # sorted by Dow-Jones Value
        Date  Dow-Jones Value
4 2018-09-29      25124.57031
2 2018-08-30      25986.91992
5 2018-09-30      25986.91992
0 2018-08-28      26064.01953
3 2018-09-28      26064.01953
1 2018-08-29      26124.57031

>>> df.groupby(pd.Grouper(key='Date', freq='M')).mean() # Monthly Averages
            Dow-Jones Value
Date                       
2018-08-31     26058.503253
2018-09-30     25725.169920

Answer 2

The solution below is not using any external library

from collections import defaultdict

monthly_data = defaultdict(list)

with open('DJI.txt') as f:
  lines = [l.strip() for l in f.readlines()]
  for line in lines:
    values = line.split('|')
    date = values[0]
    month = date.split('/')[0]
    value = float(values[1])
    monthly_data[month].append((value,date))
for month,values in monthly_data.items():
  _values = [v[0] for v in values]
  avg = sum(_values)/len(_values)
  _min = min(values, key=lambda x: x[0])
  _max = max(values, key=lambda x: x[0])
  print('Month: {}. avg value {}, min value {}, max value {}'.format(month,avg,_min,_max))

DJI.txt

8/28/2018|26064.01953
8/29/2018|26124.57031
8/30/2018|25986.91992
9/28/2018|16064.01953
9/10/2018|12.99
9/29/2018|16124.57031
9/30/2018|15986.91992 
9/12/2018|999999.91992

output

Month: 8. avg value 26058.503253333332, min value (25986.91992, '8/30/2018'), max value (26124.57031, '8/29/2018')
Month: 9. avg value 209637.68393600002, min value (12.99, '9/10/2018'), max value (999999.91992, '9/12/2018')

How to use statistics on a specific portion of a list of lists

Question

2 answers

solution1
1 ACCPTED 2019-12-02 13:35:49

solution2
1 2019-12-02 14:35:24

How to use statistics on a specific portion of a list of lists

Question

2 answers

solution1 1 ACCPTED 2019-12-02 13:35:49

solution2 1 2019-12-02 14:35:24

solution1
1 ACCPTED 2019-12-02 13:35:49

solution2
1 2019-12-02 14:35:24