如何从python的JSON字典中删除额外的统计平均值结果？

Question

我正在使用python3-我正在尝试根据井中污染物的JSON字典中的测量结果确定平均值。 当我返回代码时，它显示每行数据的平均值。 本质上，我想为一种污染物的所有结果找到一个均值。 每年在同一污染物上有多个结果。

for plants in data:

  for year in ["2010", "2011", "2012", "2013", "2014":

  arsenic_values = []
  manganese_values = []

  all_year_data = data[plants][year]

    for measurement in all_year_data:
    if measurement['contaminent'] == "arsenic":

      arsenic_values.append(float(measurement["concentration"]))
      arsenic_mean = statistics.mean(arsenic_values)

        print(plants, year, arsenic_mean)

这是一个两年的JSON外观示例。

  "well1": {
    "2010": [],
    "2011": [
      {
        "contaminent": "arsenic",
        "concentration": "0.0420000000"
      },
      {
        "contaminent": "arsenic",
        "concentration": "0.0200000000"
      },
      {
        "contaminent": "arsenic",
        "concentration": "0.0150000000"
      },
      {
        "contaminent": "arsenic",
        "concentration": "0.0320000000"
      },
      {
        "contaminent": "manganese",
        "concentration": "0.8700000000"
      },
      {
        "contaminent": "manganese",
        "concentration": "0.8400000000"
      }
    ],

Example of what it returns with my notes in ()

well1 2011 0.042
well1 2011 0.031   (this is the mean of the measurement before)
well1 2011 0.025666666666666667    (this is the mean of the measurement before and before that)    
well1 2011 0.0272    (**THIS IS WHAT I WANT** but I can't write like a counter function because the result I want is different for each well I am looking at.

IN summation:
There are multiple results for each year of the same containment and I want to find the average. But my code as it is written returns almost a triangular data that grows with each line. SO its finding's the average of each line for the containment rather than grouping all together and taking one average.

Answer 1

我们可以遍历顶级键并按污染物groupby以达到所需的结果。

from statistics import mean
from operator import itemgetter
from itertools import groupby

cnt = itemgetter('concentration')
cmt = itemgetter('contaminent')

d = {'well1': {'2010': [],
  '2011': [{'concentration': '0.0420000000', 'contaminent': 'arsenic'},
   {'concentration': '0.0200000000', 'contaminent': 'arsenic'},
   {'concentration': '0.0150000000', 'contaminent': 'arsenic'},
   {'concentration': '0.0320000000', 'contaminent': 'arsenic'},
   {'concentration': '0.8700000000', 'contaminent': 'manganese'},
   {'concentration': '0.8400000000', 'contaminent': 'manganese'}]}}

top_level = d.keys()
for key in top_level:
    for year, value in d.get(key).items():
        if not value:
            print('The year {} has no values to compute'.format(year))
        else:
            for k, v in groupby(sorted(value, key=cmt), key=cmt):
                mean_ = mean(map(float, map(cnt, v)))
                print('{} {} {} {}'.format(key, year, k, mean_))

The year 2010 has no values to compute
well1 2011 arsenic 0.02725
well1 2011 manganese 0.855

链接到您可能不熟悉的一些使用过的概念：

地图

itemgetter

通过...分组

Answer 2

如果您有很多措施，则应避免使用itertools.groupby因为它需要排序列表，并且排序很昂贵。 使用setdefault使用以well ， year和contaminent分组的值来构建字典很容易：

>>> import json
>>> data_by_year_by_well = json.loads(text)
>>> d = {}
>>> for w, data_by_year in data_by_year_by_well.items():
...     for y, data in data_by_year.items():
...         for item in data:
...             d.setdefault(w, {}).setdefault(y, {}).setdefault(item['contaminent'], []).append(float(item['concentration']))
...
>>> d
{'well1': {'2011': {'arsenic': [0.042, 0.02, 0.015, 0.032], 'manganese': [0.87, 0.84]}}}

现在，计算平均值（或中位数，或任何合计值）：

>>> from statistics import mean
>>> {w: {y: {c: mean(v) for c, v in v_by_c.items()} for y, v_by_c in d_by_y.items()} for w, d_by_y in d.items()}
{'well1': {'2011': {'arsenic': 0.02725, 'manganese': 0.855}}}

如何从python的JSON字典中删除额外的统计平均值结果？

问题描述

2 个解决方案

解决方案1
0 2019-04-23 01:37:18

解决方案2
0 2019-04-23 15:05:24

如何从python的JSON字典中删除额外的统计平均值结果？

问题描述

2 个解决方案

解决方案1 0 2019-04-23 01:37:18

解决方案2 0 2019-04-23 15:05:24

解决方案1
0 2019-04-23 01:37:18

解决方案2
0 2019-04-23 15:05:24