简体   繁体   English

Python:按数量级对列表进行分类

[英]Python: categorising a list by orders of magnitude

I have a nested list with values:我有一个带有值的嵌套列表:

list = [
...
['Country1', 142.8576737907048, 207.69725105029553, 21.613192419863577, 15.129178465784218],
['Country2', 109.33326343550823, 155.6847323746669, 15.450489646386226, 14.131554442715336],
['Country3', 99.23033109735835, 115.37122637190915, 5.380298424850267, 5.422030104456135],
...]

I want to count values in the second index / column by order of magnitude, starting at the lowest order of magnitude and ending at the largest...eg我想按数量级计算第二个索引/列中的值,从最低数量级开始到最大数量级结束......例如

99.23033109735835 = 10 <= x < 100
142.8576737907048 = 100 <= x < 1000
             9432 = 1000 <= x < 10000

The aim is to output a simple char (#) count for how many index values fall in each category, eg目的是输出一个简单的字符(#)计数,表示每个类别中有多少索引值,例如

  10 <= x < 100: ###
100 <= x < 1000: #########

I've started by grabbing the max() and min() values for the index in order to automatically calculate the largest and smalles magnitude categories, but I'm not sure how to associate each value in the column to an order of magnitude...if someone could point me in the right direction or give me some ideas I would be most grateful.我首先获取索引的max()min()值,以便自动计算最大和最小幅度类别,但我不确定如何将列中的每个值与一个数量级相关联。 ..如果有人能指出我正确的方向或给我一些想法,我将不胜感激。

This function will turn your double into an integer order of magnitude:此函数会将您的 double 转换为整数数量级:

>>> def magnitude(x):
...     return int(math.log10(x))
... 
>>> magnitude(99.23)
1
>>> magnitude(9432)
3

(so 10 ** magnitude(x) <= x <= 10 ** (1 + magnitude(x)) for all x ). (所以10 ** magnitude(x) <= x <= 10 ** (1 + magnitude(x))对于所有x )。

Just use the magnitude as a key, and count the occurrences per key.只需使用大小作为键,并计算每个键的出现次数。 defaultdict may be helpful here. defaultdict在这里可能会有所帮助。


Note this magnitude only works for positive powers of 10 (because int(double) truncation rounds towards zero).请注意,此幅度仅适用于 10 的正幂(因为int(double)截断会向零舍入)。

Use

def magnitude(x):
    return int(math.floor(math.log10(x)))

instead if this matters for your use case.相反,如果这对您的用例很重要。 (Thanks to larsmans for pointing this out). (感谢 larsmans 指出这一点)。

Extending Useless ' answer to all real numbers, you can use:Useless ' 答案扩展到所有实数,您可以使用:

import math

def magnitude (value):
    if (value == 0): return 0
    return int(math.floor(math.log10(abs(value))))

Test cases:测试用例:

In [123]: magnitude(0)
Out[123]: 0

In [124]: magnitude(0.1)
Out[124]: -1

In [125]: magnitude(0.02)
Out[125]: -2

In [126]: magnitude(150)
Out[126]: 2

In [127]: magnitude(-5280)
Out[127]: 3

If x is one of your numbers, what is len(str(int(x))) ?如果x是您的数字之一,那么len(str(int(x)))什么?

Or, if you have numbers less than 0, what is int(math.log10(x)) ?或者,如果您的数字小于 0,那么int(math.log10(x))什么?

(See also log10 's docs. Also note that int() rounding here may not be what you want - see ceil and floor , and note you may need int(ceil(...)) or int(floor(...)) to get an integer answer) (另请参阅log10的文档。另请注意,这里的 int() 四舍五入可能不是您想要的 - 请参阅ceilfloor ,并注意您可能需要int(ceil(...))int(floor(...))得到一个整数答案)

To categorize by the order of magnitude do:要按数量级分类,请执行以下操作:

from math import floor, log10
from collections import Counter
counter =  Counter(int(floor(log10(x[1]))) for x in list)

1 is from 10 to less then 100, 2 from 100 to less then 1000. 1 是从 10 到小于 100,2 从 100 到小于 1000。

print counter
Counter({2: 2, 1: 1})

Then its just simply printing it out然后它只是简单地打印出来

for x in sorted(counter.keys()):
    print "%d <= x < %d: %d" % (10**x, 10**(x+1), counter[x])
import bisect
from collections import defaultdict
lis1 = [['Country1', 142.8576737907048, 207.69725105029553, 21.613192419863577, 15.129178465784218],
['Country2', 109.33326343550823, 155.6847323746669, 15.450489646386226, 14.131554442715336],
['Country3', 99.23033109735835, 115.37122637190915, 5.380298424850267, 5.422030104456135],
]
lis2 = [0, 100, 1000, 1000]

dic = defaultdict(int)

for x in lis1:
       x = x[1]
       ind=bisect.bisect(lis2,x) 
       if not (x >= lis2[-1] or x <= lis2[0]):
           sm, bi = lis2[ind-1], lis2[ind]
           dic ["{} <= {} <= {}".format(sm ,x, bi)] +=1
for k,v in dic.items():
    print k,'-->',v

output:输出:

0 <= 99.2303310974 <= 100 --> 1
100 <= 142.857673791 <= 1000 --> 1
100 <= 109.333263436 <= 1000 --> 1

In case you ever want overlapping ranges or ranges with arbitrary bounds (not sticked to orders of magnitude/powers of 2/any other predictable series):如果您想要重叠范围或具有任意边界的范围(不遵守数量级/2 的幂/任何其他可预测系列):

from collections import defaultdict
lst = [
    ['Country1', 142.8576737907048, 207.69725105029553, 21.613192419863577, 15.129178465784218],
    ['Country2', 109.33326343550823, 155.6847323746669, 15.450489646386226, 14.131554442715336],
    ['Country3', 99.23033109735835, 115.37122637190915, 5.380298424850267, 5.422030104456135],
]

buckets = {
    '10<=x<100': lambda x: 10<=x<100,
    '100<=x<1000': lambda x: 100<=x<1000,
}

result = defaultdict(int)
for item in lst:
    second_column = item[1]
    for label, range_check in buckets.items():
        if range_check(second_column):
            result[label] +=1

print (result)

Another option, using bisect另一种选择,使用bisect

import bisect
from collections import Counter
list0 = [
['Country1', 142.8576737907048, 207.69725105029553, 21.613192419863577, 15.129178465784218],
['Country2', 109.33326343550823, 155.6847323746669, 15.450489646386226, 14.131554442715336],
['Country3', 99.23033109735835, 115.37122637190915, 5.380298424850267, 5.422030104456135]
]

magnitudes = [10**x for x in xrange(5)]
c = Counter(bisect.bisect(magnitudes, x[1]) for x in list0)
for x in c:
  print x, '#'*c[x]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM