简体   繁体   中英

Build histogram

I'm trying to make histogram by python. I am starting with the following snippet:

def histogram(L):
    d = {}
    for x in L:
        if x in d:
            d[x] += 1
        else:
            d[x] = 1
    return d

I understand it's using dictionary function to solve the problem. But I'm just confused about the 4th line: if x in d:

d is to be constructed, there's nothing in d yet, so how come if x in d?

Keep in mind, that if is inside a for loop.

So, when you're looking at the very first item in L there is nothing in d , but when you get to the next item in L , there is something in d , so you need to check whether to make a new bin on the histogram ( d[x] = 1 ), or add the item to an existing bin ( d[x] += 1 ).

In Python, we actually have some shortcuts for this:

from collections import defaultdict

def histogram(L):
    d = defaultdict(int)
    for x in L:
        d[x] += 1
return d

This automatically starts each bin in d at zero (what int() returns) so you don't have to check if the bin exists. On Python 2.7 or higher:

from collections import Counter

d = Counter(L)

Will automatically make a mapping of the frequencies of each item in L . No other code required.

You can create a histogram with a dict comprehension:

histogram = {key: l.count(key) for key in set(L)}

The code inside of the for loop will be executed once for each element in L , with x being the value of the current element.

Lets look at the simple case where L is the list [3, 3] . The first time through the loop d will be empty, x will be 3, and 3 in d will be false, so d[3] will be set to 1. The next time through the loop x will be 3 again, and 3 in d will be true, so d[3] will be incremented by 1.

You can use a Counter , available from Python 2.7 and Python 3.1+.

>>> # init empty counter
>>> from collections import Counter
>>> c = Counter()

>>> # add a single sample to the histogram
>>> c.update([4])
>>> # add several samples at once
>>> c.update([4, 2, 2, 5])

>>> # print content
>>> print c

Counter({2: 2, 4: 2, 5: 1})

The module brings several nice features, like addition, subtraction, intersection and union on counters. The Counter can count anything which can be used as a dictionary key.

I think the other guys have explained you why if x in d . But here is a clue, how this code should be written following "don't ask permission, ask forgiveness":

    ...
    try:
        d[x] += 1
    except KeyError:
        d[x] = 1

The reason for this, is that you expect this error to appear only once (at least once per method call). Thus, there is no need to check if x in d .

You can create your own histogram in Python using for example matplotlib . If you want to see one example about how this could be implemented, you can refer to this answer .

在此处输入图像描述

In this specific case, you can use doing:

temperature = [4,   3,   1,   4,   6,   7,   8,   3,   1]
radius      = [0,   2,   3,   4,   0,   1,   2,  10,   7]
density     = [1,  10,   2,  24,   7,  10,  21, 102, 203]

points, sub = hist3d_bubble(temperature, density, radius, bins=4)
sub.axes.set_xlabel('temperature')
sub.axes.set_ylabel('density')
sub.axes.set_zlabel('radius')

if x isn't in d, then it gets put into d with d[x] = 1. Basically, if x shows up in d more than once it increases the number matched with x.

Try using this to step through the code: http://people.csail.mit.edu/pgbovine/python/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM