pd.cut 可以同时使用区间范围和标签吗？

Question

I'm fiddling around with something like this.我正在摆弄这样的东西。

bins = [0, .25, .5, .75, 1, 1.25, 1.5, 1.75, 2]
labels = ['0', '.25', '.5', '.75', '1', '1.25', '1.5', '1.75', '2']
dataset['RatingScore'] = pd.cut(dataset['Rating'], bins, labels)

What I am actually getting is a range, like this: (0.75, 1.0]我实际得到的是一个范围，像这样： (0.75, 1.0]

I would like to get results like this: .75 or 1 or 1.25我想得到这样的结果： .75 or 1 or 1.25

Is it possible to get a specific number and NOT a range?是否有可能得到一个特定的数字而不是一个范围？ Thanks.谢谢。

Andy, your code runs, and it gives me actual numbers, rather than ranges, but I'm seeing a lot of gaps too.安迪，你的代码运行了，它给了我实际的数字，而不是范围，但我也看到了很多差距。

Answer 1

You pass labels to the 3rd parameter of pd.cut .您将labels传递给pd.cut的第三个参数。 The third parameter of pd.cut is right=... . pd.cut的第三个参数是right=... 。 It accepts True/False as values.它接受True/False作为值。 labels is non-empty list, so it is considered as True . labels是非空列表，因此它被认为是True 。 Therefore, pd.cut executes as there is no label.因此， pd.cut在没有标签的情况下执行。 You need to use keyword parameter to correctly specify list labels as labels for pd.cut .您需要使用关键字参数将列表labels正确指定为pd.cut的标签。 Another thing, number of bins must be one item more than labels .另一件事， bins的数量必须比labels多一个项目。 You need to add np.inf to the right of list bins您需要将np.inf添加到列表bins的右侧

s = pd.Series([0.2, 0.6, 0.1, 0.9, 2])
bins = [0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, np.inf]
labels = ['0', '.25', '.5', '.75', '1', '1.25', '1.5', '1.75', '2']

s_cat = pd.cut(s, bins=bins, labels=labels)

Out[1165]:
0       0
1      .5
2       0
3     .75
4    1.75
dtype: category
Categories (9, object): [0 < .25 < .5 < .75 ... 1.25 < 1.5 < 1.75 < 2]

Answer 2

If you don't add infinity to the bins you'll have as possible output float ( np.nan ) or interval let says you want to take the right interval you could try as follow如果您不向垃圾箱添加无穷大，您将获得尽可能多的输出浮点数（ np.nan ）或间隔让你说你想采用正确的间隔你可以尝试如下

import pandas as pd
import numpy as np

def fun(x):
    if isinstance(x, float) is True:
        return np.nan
    else:
        return x.right

df = pd.DataFrame({"Rating":[.1* i for i in range(10)]})
bins = [0, .25, .5, .75, 1, 1.25, 1.5, 1.75, 2]
df["RatingScore"] = pd.cut(df['Rating'], bins)

df["RatingScore"].apply(fun)

0     NaN
1    0.25
2    0.25
3    0.50
4    0.50
5    0.50
6    0.75
7    0.75
8    1.00
9    1.00

pd.cut 可以同时使用区间范围和标签吗？

问题描述

2 个解决方案

解决方案1
3 已采纳 2020-01-23 18:28:11

解决方案2
1 2020-01-23 18:04:33

pd.cut 可以同时使用区间范围和标签吗？

问题描述

2 个解决方案

解决方案1 3 已采纳 2020-01-23 18:28:11

解决方案2 1 2020-01-23 18:04:33

解决方案1
3 已采纳 2020-01-23 18:28:11

解决方案2
1 2020-01-23 18:04:33