[英]Can pd.cut use interval range and labels together?
I'm fiddling around with something like this.我正在摆弄这样的东西。
bins = [0, .25, .5, .75, 1, 1.25, 1.5, 1.75, 2]
labels = ['0', '.25', '.5', '.75', '1', '1.25', '1.5', '1.75', '2']
dataset['RatingScore'] = pd.cut(dataset['Rating'], bins, labels)
What I am actually getting is a range, like this: (0.75, 1.0]
我实际得到的是一个范围,像这样:
(0.75, 1.0]
I would like to get results like this: .75 or 1 or 1.25
我想得到这样的结果:
.75 or 1 or 1.25
Is it possible to get a specific number and NOT a range?是否有可能得到一个特定的数字而不是一个范围? Thanks.
谢谢。
Andy, your code runs, and it gives me actual numbers, rather than ranges, but I'm seeing a lot of gaps too.安迪,你的代码运行了,它给了我实际的数字,而不是范围,但我也看到了很多差距。
You pass labels
to the 3rd parameter of pd.cut
.您将
labels
传递给pd.cut
的第三个参数。 The third parameter of pd.cut
is right=...
. pd.cut
的第三个参数是right=...
。 It accepts True/False
as values.它接受
True/False
作为值。 labels
is non-empty list, so it is considered as True
. labels
是非空列表,因此它被认为是True
。 Therefore, pd.cut
executes as there is no label.因此,
pd.cut
在没有标签的情况下执行。 You need to use keyword parameter to correctly specify list labels
as labels for pd.cut
.您需要使用关键字参数将列表
labels
正确指定为pd.cut
的标签。 Another thing, number of bins
must be one item more than labels
.另一件事,
bins
的数量必须比labels
多一个项目。 You need to add np.inf
to the right of list bins
您需要将
np.inf
添加到列表bins
的右侧
s = pd.Series([0.2, 0.6, 0.1, 0.9, 2])
bins = [0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, np.inf]
labels = ['0', '.25', '.5', '.75', '1', '1.25', '1.5', '1.75', '2']
s_cat = pd.cut(s, bins=bins, labels=labels)
Out[1165]:
0 0
1 .5
2 0
3 .75
4 1.75
dtype: category
Categories (9, object): [0 < .25 < .5 < .75 ... 1.25 < 1.5 < 1.75 < 2]
If you don't add infinity to the bins you'll have as possible output float ( np.nan
) or interval let says you want to take the right interval you could try as follow如果您不向垃圾箱添加无穷大,您将获得尽可能多的输出浮点数(
np.nan
)或间隔让你说你想采用正确的间隔你可以尝试如下
import pandas as pd
import numpy as np
def fun(x):
if isinstance(x, float) is True:
return np.nan
else:
return x.right
df = pd.DataFrame({"Rating":[.1* i for i in range(10)]})
bins = [0, .25, .5, .75, 1, 1.25, 1.5, 1.75, 2]
df["RatingScore"] = pd.cut(df['Rating'], bins)
df["RatingScore"].apply(fun)
0 NaN
1 0.25
2 0.25
3 0.50
4 0.50
5 0.50
6 0.75
7 0.75
8 1.00
9 1.00
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.