简体   繁体   English

pd.cut 可以同时使用区间范围和标签吗?

[英]Can pd.cut use interval range and labels together?

I'm fiddling around with something like this.我正在摆弄这样的东西。

bins = [0, .25, .5, .75, 1, 1.25, 1.5, 1.75, 2]
labels = ['0', '.25', '.5', '.75', '1', '1.25', '1.5', '1.75', '2']
dataset['RatingScore'] = pd.cut(dataset['Rating'], bins, labels)

What I am actually getting is a range, like this: (0.75, 1.0]我实际得到的是一个范围,像这样: (0.75, 1.0]

I would like to get results like this: .75 or 1 or 1.25我想得到这样的结果: .75 or 1 or 1.25

Is it possible to get a specific number and NOT a range?是否有可能得到一个特定的数字而不是一个范围? Thanks.谢谢。

Andy, your code runs, and it gives me actual numbers, rather than ranges, but I'm seeing a lot of gaps too.安迪,你的代码运行了,它给了我实际的数字,而不是范围,但我也看到了很多差距。

在此处输入图像描述

You pass labels to the 3rd parameter of pd.cut .您将labels传递给pd.cut的第三个参数。 The third parameter of pd.cut is right=... . pd.cut的第三个参数是right=... It accepts True/False as values.它接受True/False作为值。 labels is non-empty list, so it is considered as True . labels是非空列表,因此它被认为是True Therefore, pd.cut executes as there is no label.因此, pd.cut在没有标签的情况下执行。 You need to use keyword parameter to correctly specify list labels as labels for pd.cut .您需要使用关键字参数将列表labels正确指定为pd.cut的标签。 Another thing, number of bins must be one item more than labels .另一件事, bins的数量必须比labels多一个项目。 You need to add np.inf to the right of list bins您需要将np.inf添加到列表bins的右侧

s = pd.Series([0.2, 0.6, 0.1, 0.9, 2])
bins = [0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, np.inf]
labels = ['0', '.25', '.5', '.75', '1', '1.25', '1.5', '1.75', '2']

s_cat = pd.cut(s, bins=bins, labels=labels)

Out[1165]:
0       0
1      .5
2       0
3     .75
4    1.75
dtype: category
Categories (9, object): [0 < .25 < .5 < .75 ... 1.25 < 1.5 < 1.75 < 2]

If you don't add infinity to the bins you'll have as possible output float ( np.nan ) or interval let says you want to take the right interval you could try as follow如果您不向垃圾箱添加无穷大,您将获得尽可能多的输出浮点数( np.nan )或间隔让你说你想采用正确的间隔你可以尝试如下

import pandas as pd
import numpy as np

def fun(x):
    if isinstance(x, float) is True:
        return np.nan
    else:
        return x.right

df = pd.DataFrame({"Rating":[.1* i for i in range(10)]})
bins = [0, .25, .5, .75, 1, 1.25, 1.5, 1.75, 2]
df["RatingScore"] = pd.cut(df['Rating'], bins)

df["RatingScore"].apply(fun)

0     NaN
1    0.25
2    0.25
3    0.50
4    0.50
5    0.50
6    0.75
7    0.75
8    1.00
9    1.00

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM