簡體   English   中英

如何從python的JSON字典中刪除額外的統計平均值結果?

[英]How to remove extra statistical mean results from JSON dictionary in python?

我正在使用python3-我正在嘗試根據井中污染物的JSON字典中的測量結果確定平均值。 當我返回代碼時,它顯示每行數據的平均值。 本質上,我想為一種污染物的所有結果找到一個均值。 每年在同一污染物上有多個結果。

for plants in data:

  for year in ["2010", "2011", "2012", "2013", "2014":

  arsenic_values = []
  manganese_values = []

  all_year_data = data[plants][year]

    for measurement in all_year_data:
    if measurement['contaminent'] == "arsenic":

      arsenic_values.append(float(measurement["concentration"]))
      arsenic_mean = statistics.mean(arsenic_values)

        print(plants, year, arsenic_mean)

這是一個兩年的JSON外觀示例。

  "well1": {
    "2010": [],
    "2011": [
      {
        "contaminent": "arsenic",
        "concentration": "0.0420000000"
      },
      {
        "contaminent": "arsenic",
        "concentration": "0.0200000000"
      },
      {
        "contaminent": "arsenic",
        "concentration": "0.0150000000"
      },
      {
        "contaminent": "arsenic",
        "concentration": "0.0320000000"
      },
      {
        "contaminent": "manganese",
        "concentration": "0.8700000000"
      },
      {
        "contaminent": "manganese",
        "concentration": "0.8400000000"
      }
    ],

Example of what it returns with my notes in ()

well1 2011 0.042
well1 2011 0.031   (this is the mean of the measurement before)
well1 2011 0.025666666666666667    (this is the mean of the measurement before and before that)    
well1 2011 0.0272    (**THIS IS WHAT I WANT** but I can't write like a counter function because the result I want is different for each well I am looking at.

IN summation:
There are multiple results for each year of the same containment and I want to find the average. But my code as it is written returns almost a triangular data that grows with each line. SO its finding's the average of each line for the containment rather than grouping all together and taking one average.

我們可以遍歷頂級鍵並按污染物groupby以達到所需的結果。

from statistics import mean
from operator import itemgetter
from itertools import groupby

cnt = itemgetter('concentration')
cmt = itemgetter('contaminent')

d = {'well1': {'2010': [],
  '2011': [{'concentration': '0.0420000000', 'contaminent': 'arsenic'},
   {'concentration': '0.0200000000', 'contaminent': 'arsenic'},
   {'concentration': '0.0150000000', 'contaminent': 'arsenic'},
   {'concentration': '0.0320000000', 'contaminent': 'arsenic'},
   {'concentration': '0.8700000000', 'contaminent': 'manganese'},
   {'concentration': '0.8400000000', 'contaminent': 'manganese'}]}}

top_level = d.keys()
for key in top_level:
    for year, value in d.get(key).items():
        if not value:
            print('The year {} has no values to compute'.format(year))
        else:
            for k, v in groupby(sorted(value, key=cmt), key=cmt):
                mean_ = mean(map(float, map(cnt, v)))
                print('{} {} {} {}'.format(key, year, k, mean_))

The year 2010 has no values to compute
well1 2011 arsenic 0.02725
well1 2011 manganese 0.855

鏈接到您可能不熟悉的一些使用過的概念:

地圖

itemgetter

通過...分組

如果您有很多措施,則應避免使用itertools.groupby因為它需要排序列表,並且排序很昂貴。 使用setdefault使用以wellyearcontaminent分組的值來構建字典很容易:

>>> import json
>>> data_by_year_by_well = json.loads(text)
>>> d = {}
>>> for w, data_by_year in data_by_year_by_well.items():
...     for y, data in data_by_year.items():
...         for item in data:
...             d.setdefault(w, {}).setdefault(y, {}).setdefault(item['contaminent'], []).append(float(item['concentration']))
...
>>> d
{'well1': {'2011': {'arsenic': [0.042, 0.02, 0.015, 0.032], 'manganese': [0.87, 0.84]}}}

現在,計算平均值(或中位數,或任何合計值):

>>> from statistics import mean
>>> {w: {y: {c: mean(v) for c, v in v_by_c.items()} for y, v_by_c in d_by_y.items()} for w, d_by_y in d.items()}
{'well1': {'2011': {'arsenic': 0.02725, 'manganese': 0.855}}}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM