從值字典中找到中位數及其出現次數？

Question

我有一本看起來像這樣的字典（雖然大得多）：

data = {
    100: 8,
    110: 2,
    1000: 4,
    2200: 3,
    4000: 1,
    11000: 1,
}

每對包含值：我的數據集中出現的次數。 我需要計算數據集的中位數。 任何提示/想法如何去做？

我正在使用 Python 3.6

編輯：

我不想創建列表（因為我的數據集的大小）。 列表的大小實際上是改用字典的原因。 所以，我正在尋找另一種方式。

Answer 1

我相信這個解決方案同樣有效，至少對於正數是這樣。 我結合您的回答測試了一些數據集，據我所知，它們的工作方式相似。

（sorted_dict 是按其鍵值排序的字典）

    length = 0
    for value in sorted_dict.values():
        length += value
    half = length / 2
    sum_var = 0
    #finds the index of the middle of the dataset
    for val in sorted_dict.values():
        if half-sum_var > 0:
            sum_var += val
        else:
            break
    index = (list(sorted_dict.values()).index(val))
    #returns the median based off some characteristics of the dataset
    if sum(list(sorted_dict.values())[index:]) != sum(list(sorted_dict.values())[:index]):
        if sum(list(sorted_dict.values())[index:]) > sum(list(sorted_dict.values())[:index]):
            median = list(sorted_dict.keys())[index]
        else:
            median = list(sorted_dict.keys())[index-1]
    else:
        median = (list(sorted_dict.keys())[index-1] + list(sorted_dict.keys())[index]) / 2
    return(median)

Answer 2

當您訂購 dict 時，這將適用於 python 3.6+。

from math import floor, ceil

def find_weighted_median(d):
    median_location = sum(d.values()) / 2
    lower_location = floor(median_location)
    upper_location = ceil(median_location)
    lower = None
    upper = None
    running_total = 0
    for val, count in d.items():
        if not lower and running_total <= lower_location <= running_total + count:
            lower = val
        if running_total <= upper_location <= running_total + count:
            upper = val
        if lower and upper:
            return (lower + upper) / 2
        running_total += count

Answer 3

所以，沒有找到令人滿意的答案，這就是我想出的：

from collections import OrderedDict
import statistics

d = {
 100: 8,
 110: 2,
 1000: 4,
 2200: 3,
 4000: 1,
 11000: 1,
}

    # Sort the dictionary
values_sorted = OrderedDict(sorted(d.items(), key=lambda t: t[0]))
index = sum(values_sorted.values())/2

# Decide whether the number of records is an even or odd number
if (index).is_integer():
    even = True
else: 
    even = False

x = True

# Compute median
for value, occurences in values_sorted.items():
    index -= occurences
    if index < 0 and x is True:
        median_manual = value
        break
    elif index == 0 and even is True:
        median_manual = value/2
        x = False
    elif index < 0 and x is False:

        median_manual += value/2
        break

# Create a list of all records and compute median using statistics package
values_list = list()
for val, count in d.items():
    for count in range(count):
        values_list.append(val)

median_computed = statistics.median(values_list)

# Test the two results are equal
if median_manual != median_computed:
    raise RuntimeError

我用不同的數據集對其進行了測試，並將結果與 statistics.median() 計算的中位數進行了比較，結果是相同的。

Answer 4

下面是一個基於熊貓的解決方案。

import pandas as pd

def getMed(item_dict : dict[int, int]) -> int:
    'function finds median'
    df = pd.DataFrame.from_dict(item_dict, orient='index').reset_index()
    df.columns = ['values', 'count']
    df.sort_values('values', inplace=True)
    df['cum_sum'] = df['count'].cumsum()
    total_count = df.iloc[-1, -1]
    for id, row in df.iterrows():
        if row['cum_sum'] >= int(total_count*0.5):
            return row['values']

您輸入的結果：

your_dict = {100: 8,
             110: 2,
             1000: 4,
             2200: 3,
             4000: 1,
             11000: 1
            }

getMed(your_dict)
>> 110

Answer 5

這是我的看法：

data = {
    100: 8,
    110: 2,
    1000: 4,
    2200: 3,
    4000: 1,
    11000: 2,
}
total_frequency = sum([v for v in data.values()])           # 1
middles = (total_frequency+1)//2, (total_frequency+2)//2    # 2

cumulated, first, second = 0, None, None

for key, frequency in data.items():                         # 3
    cumulated += frequency                                  # 3
    if (not first) and cumulated >= middles[0]:             # 4
        first = key
    if (not second) and cumulated >= middles[1]:            # 4
        second = key


median = (first+second)/2                                   # 5

print(f'''
Middle Frequencies: {middles[0]},{middles[1]}
Middle Values: {first},{second}
Median: {median}
''')

步驟是：

計算表的總頻率，即字典中的值。
找到兩個中間頻率。 如果有一個奇數，他們將是相同的。
遍歷表格並累積頻率。
如果累積的頻率已經達到其中一個，存儲密鑰。
中位數將是兩者的平均值。

從值字典中找到中位數及其出現次數？

問題描述

5 個解決方案

解決方案1
1 2019-07-15 16:21:57

解決方案2
0 2018-03-26 15:11:29

解決方案3
0 已采納 2018-03-29 10:32:17

解決方案4
0 2021-10-05 21:14:33

解決方案5
0 2022-06-07 08:14:23

從值字典中找到中位數及其出現次數？

問題描述

5 個解決方案

解決方案1 1 2019-07-15 16:21:57

解決方案2 0 2018-03-26 15:11:29

解決方案3 0 已采納 2018-03-29 10:32:17

解決方案4 0 2021-10-05 21:14:33

解決方案5 0 2022-06-07 08:14:23

解決方案1
1 2019-07-15 16:21:57

解決方案2
0 2018-03-26 15:11:29

解決方案3
0 已采納 2018-03-29 10:32:17

解決方案4
0 2021-10-05 21:14:33

解決方案5
0 2022-06-07 08:14:23