简体   繁体   English

如何在pandas.cut中获得合适的关卡表示?

[英]How can I get a suitable representation of levels in pandas.cut?

Is there an easy way to obtain the values of the levels produced by pandas.cut? 有没有一种简单的方法来获取pandas.cut生成的级别的值?

For example: 例如:

import pandas as pd
x = pd.cut(np.arange(0,20), 10)

x
Out[1]: 
 (-0.019, 1.9]
 (-0.019, 1.9]
    (1.9, 3.8]
    (1.9, 3.8]
    (3.8, 5.7]
    (3.8, 5.7]
    (5.7, 7.6]
    (5.7, 7.6]
    (7.6, 9.5]
    (7.6, 9.5]
   (9.5, 11.4]
   (9.5, 11.4]
  (11.4, 13.3]
  (11.4, 13.3]
  (13.3, 15.2]
  (13.3, 15.2]
  (15.2, 17.1]
  (15.2, 17.1]
    (17.1, 19]
    (17.1, 19]
Levels (10): Index(['(-0.019, 1.9]', '(1.9, 3.8]', '(3.8, 5.7]',
                    '(5.7, 7.6]', '(7.6, 9.5]', '(9.5, 11.4]',
                    '(11.4, 13.3]', '(13.3, 15.2]', '(15.2, 17.1]',
                    '(17.1, 19]'], dtype=object)

What I would like to get is something like: 我想得到的是:

x.magic_method
Out[2]:
[[-0.019, 1.9], [1.9, 3.8], [3.8, 5.7],
                        [5.7, 7.6], [7.6, 9.5], [9.5, 11.4],
                        [11.4, 13.3], [13.3, 15.2], (15.2, 17.1],
                        [17.1, 19]]

or some other representation more suitable to manipulation. 或其他一些更适合操纵的表现形式。 Instead, we obtain the index by using x.levels, but this representation is a unicode object, so I have to use a couple of loops to get what I want. 相反,我们使用x.levels获取索引,但是这个表示是一个unicode对象,所以我必须使用几个循环来获得我想要的东西。

UPDATE : 更新

By the way, I need a solution that works with a sequence of values in the second argument: pd.cut(np.arange(0,20), arr) 顺便说一下,我需要一个在第二个参数中使用一系列值的解决方案: pd.cut(np.arange(0,20), arr)

You can convert from unicode list to an array by following code: 您可以通过以下代码从unicode列表转换为数组:

import pandas as pd
x = pd.cut(np.arange(0,20), 10)
np.array(map(lambda t:t[1:-1].split(","), x.levels), float)

You can do this, but prob better to explain what you are actually doing; 你可以做到这一点,但更好地解释你实际在做什么; eg you already have the Categorical variable. 例如,您已经拥有了分类变量。

In [27]: x, bins = pd.cut(np.arange(0,20), 10, retbins=True)

In [28]: [ [ round(l,3), round(r,3) ] for l, r in zip(bins[:-1],bins[1:]) ]
Out[28]: 
[[-0.019, 1.9],
 [1.9, 3.8],
 [3.8, 5.7],
 [5.7, 7.6],
 [7.6, 9.5],
 [9.5, 11.4],
 [11.4, 13.3],
 [13.3, 15.2],
 [15.2, 17.1],
 [17.1, 19.0]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM