I'm working in python3 - I'm trying to determine the mean from measurements in a JSON dictionary of contaminants in a well. When I return the code its shows the mean of the data for each line. Essentially I want to find one mean for all results of one contaminant. There are multiple results for the same contaminant within each year.
for plants in data:
for year in ["2010", "2011", "2012", "2013", "2014":
arsenic_values = []
manganese_values = []
all_year_data = data[plants][year]
for measurement in all_year_data:
if measurement['contaminent'] == "arsenic":
arsenic_values.append(float(measurement["concentration"]))
arsenic_mean = statistics.mean(arsenic_values)
print(plants, year, arsenic_mean)
Here's an example of what the JSON looks like for 2 years.
"well1": {
"2010": [],
"2011": [
{
"contaminent": "arsenic",
"concentration": "0.0420000000"
},
{
"contaminent": "arsenic",
"concentration": "0.0200000000"
},
{
"contaminent": "arsenic",
"concentration": "0.0150000000"
},
{
"contaminent": "arsenic",
"concentration": "0.0320000000"
},
{
"contaminent": "manganese",
"concentration": "0.8700000000"
},
{
"contaminent": "manganese",
"concentration": "0.8400000000"
}
],
Example of what it returns with my notes in ()
well1 2011 0.042
well1 2011 0.031 (this is the mean of the measurement before)
well1 2011 0.025666666666666667 (this is the mean of the measurement before and before that)
well1 2011 0.0272 (**THIS IS WHAT I WANT** but I can't write like a counter function because the result I want is different for each well I am looking at.
IN summation:
There are multiple results for each year of the same containment and I want to find the average. But my code as it is written returns almost a triangular data that grows with each line. SO its finding's the average of each line for the containment rather than grouping all together and taking one average.
We can iterate over the top-level keys and groupby
the contaminent to achieve the desired result.
from statistics import mean
from operator import itemgetter
from itertools import groupby
cnt = itemgetter('concentration')
cmt = itemgetter('contaminent')
d = {'well1': {'2010': [],
'2011': [{'concentration': '0.0420000000', 'contaminent': 'arsenic'},
{'concentration': '0.0200000000', 'contaminent': 'arsenic'},
{'concentration': '0.0150000000', 'contaminent': 'arsenic'},
{'concentration': '0.0320000000', 'contaminent': 'arsenic'},
{'concentration': '0.8700000000', 'contaminent': 'manganese'},
{'concentration': '0.8400000000', 'contaminent': 'manganese'}]}}
top_level = d.keys()
for key in top_level:
for year, value in d.get(key).items():
if not value:
print('The year {} has no values to compute'.format(year))
else:
for k, v in groupby(sorted(value, key=cmt), key=cmt):
mean_ = mean(map(float, map(cnt, v)))
print('{} {} {} {}'.format(key, year, k, mean_))
The year 2010 has no values to compute
well1 2011 arsenic 0.02725
well1 2011 manganese 0.855
Links to some concepts that are used that you might not be familiar with:
If you have a lot of measures, you should avoid itertools.groupby
since it needs a sorted list and sorting is expensive. It's easy to build a dictionary with the values grouped by well
, year
and contaminent
using setdefault
:
>>> import json
>>> data_by_year_by_well = json.loads(text)
>>> d = {}
>>> for w, data_by_year in data_by_year_by_well.items():
... for y, data in data_by_year.items():
... for item in data:
... d.setdefault(w, {}).setdefault(y, {}).setdefault(item['contaminent'], []).append(float(item['concentration']))
...
>>> d
{'well1': {'2011': {'arsenic': [0.042, 0.02, 0.015, 0.032], 'manganese': [0.87, 0.84]}}}
Now, compute the mean (or the median, or any aggregate value):
>>> from statistics import mean
>>> {w: {y: {c: mean(v) for c, v in v_by_c.items()} for y, v_by_c in d_by_y.items()} for w, d_by_y in d.items()}
{'well1': {'2011': {'arsenic': 0.02725, 'manganese': 0.855}}}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.