简体   繁体   English

如何将元组列表转换为“ bins”中包含元组的“直方图”?

[英]How to turn a list of tuples into a “histogram” where the bins contain the tuples?

Say I have a list of coordinates (tuples with a constant length n ) where n is determined at runtime. 假设我有一个坐标列表(长度为n元组),其中n在运行时确定。 I would like to essentially build an n -dimensional histogram but where the bins aren't just counts but rather, each contains all the coordinate-tuples which fall into that bin. 我想本质上构建一个n维直方图,但其中的bin不仅仅是计数,而是每个都包含落入该bin的所有坐标元组。

Example of what I'd like: 我想要的示例:

Input: 输入:

list: [(-0.308, 0.414), (-0.058, -0.279), (0.860, 0.118), (-0.543, -0.093)]
bin_width: 1

Output: 输出:

[[[(-0.058, -0.279), (-0.543, -0.093)], [(-0.308, 0.414)]], [[], [(0.860, 0.118)]]]

Update: I have a solution now (see my answer below). 更新:我现在有一个解决方案(请参阅下面的答案)。 Though if you have a better idea, please share. 虽然,如果您有更好的主意,请分享。 In particular, it would be nice to convert this method over to generators instead of lists. 特别是,将此方法转换为生成器而不是列表会很好。 - My example here is short but the way I intend to use it, my input list might be very large and I also only really need to use the output once. -这里的示例很简短,但是我打算使用它的方式,我的输入列表可能很大,我也只需要使用一次输出即可。

Hopefully I did this right. 希望我做对了。

Functions: 职能:

from math import *


def minmax(coordinate_list):                                        # returns a list of the minimum and maximum
    return map(lambda x: (min(x), max(x)), zip(*coordinate_list))   # occuring value of each coordinate of input lists


def find_range(min_max_list):                                           # for each dimension finds the necessary
    return map(lambda x, y: ceil(y) - floor(x), *zip(*min_max_list))    # range for the nested list


def find_bin_range(ranges, bin_width):     # turns the ranges in coordinate units into ones in terms of bin widths
    return [max(r * bin_width, 1) for r in ranges]


def build_bins(bin_ranges):     # given a list of ranges, recursively builds a nested list structure to be filled --
    if not bin_ranges:          # the histogram bins
        return []
    return [build_bins(bin_ranges[1:]) for _ in range(ceil(bin_ranges[0]))]


def access_bin(coordinates, key, bins, bin_width, min_max_list):    # recursively accesses each bin
    if not key:                                                     # and fills it with coordinate
        bins.append(coordinates)
    else:
        minimum, _ = min_max_list[0]
        i = int((key[0] - floor(minimum)) * bin_width)
        return access_bin(coordinates, key[1:], bins[i], bin_width, min_max_list[1:])


def fill_bins(coordinate_list, bins, bin_width, min_max_list):    # fills each bin with appropriate coordinates
    for coordinates in coordinate_list:
        access_bin(coordinates, coordinates, bins, bin_width, min_max_list)
    return bins


def coordinate_list_to_bins(coordinate_list, bin_width):    # the complete procedure
    min_max_list = list(minmax(coordinate_list))
    ranges = find_range(min_max_list)
    bin_ranges = find_bin_range(ranges)
    bins = build_bins(bin_ranges)
    return fill_bins(coordinate_list, bins, bin_width, min_max_list)

Usage: 用法:

import random


coordinate_list = [(random.uniform(-1, 1), random.uniform(-.5, .5)) for _ in range(4)]
bin_width = 1
print(coordinate_list)
print(coordinate_list_to_bins(coordinate_list, bin_width))

Output: 输出:

[(0.197, 0.278), (0.333, -0.030), (0.363, -0.298), (0.553, -0.286)]
[[[(0.333, -0.030), (0.363, -0.298), (0.553, -0.286)], [(0.197, 0.278)]]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM