繁体   English   中英

子集pandas DataFrame基于bin

[英]Subset pandas DataFrame based on a bin

我正在尝试基于分类类别对pandas DataFrame进行子集化。 (我知道你可以根据自己的价值观子集,这是一个不同的问题,其实我需要斌数据!的只是一种表象)我觉得我失去了一些东西有关的子集,但无法找到它什么在文档中。 这是一个例子:

import numpy as np
import pandas as pd

np.random.seed(9876)

# Generating random data for binning.
bin_step = 0.5
random_data = np.random.uniform(low = 0, high = 10, size = 30)

# Generating bin ranges
bin_ranges = np.arange(start = random_data.min(), 
                           stop = random_data.max() + random_data.max()*0.1, 
                           step = bin_step)

# Cutting the random data into predefined bins.
bins = pd.cut(random_data.tolist(), 
              bin_ranges,
              right = True,
              include_lowest = True)

# Aggregating into a pandas DataFrame
random_data_pd = pd.Series(random_data.tolist(), name = 'values')
bins_transformed = pd.Series(bins, name = 'bins')

df = pd.concat([bins_transformed, random_data_pd], axis = 1)

例如(5.086, 5.586]箱进行子集化时,它返回所有False 。为什么这不是子集?

df.bins == '(5.086, 5.586]' #returns all false.

如果我理解正确,原因是你使用==到不同的类型, pd.Interval vs str 请检查我的例子。

print(type(df.bins[0]))

<class 'pandas._libs.interval.Interval'>

print(df.bins)
print(df.bins == pd.Interval(5.1, 5.2))

0     (1.586, 2.086]
1     (6.086, 6.586]
2     (8.586, 9.086]
3     (7.586, 8.086]
4     (5.086, 5.586]
5     (0.585, 1.086]
6     (4.586, 5.086]
7     (1.086, 1.586]
8     (9.086, 9.586]
9     (4.586, 5.086]
10    (1.586, 2.086]
11    (1.086, 1.586]
12    (2.586, 3.086]
13    (2.586, 3.086]
14    (1.086, 1.586]
15    (8.086, 8.586]
16    (7.086, 7.586]
17    (6.586, 7.086]
18    (8.586, 9.086]
19    (7.586, 8.086]
20    (7.586, 8.086]
21    (0.585, 1.086]
22    (4.586, 5.086]
23    (9.086, 9.586]
24    (8.086, 8.586]
25    (6.586, 7.086]
26    (5.086, 5.586]
27    (6.586, 7.086]
28    (5.086, 5.586]
29    (9.086, 9.586]
Name: bins, dtype: category
Categories (19, interval[float64]): [(0.585, 1.086] < (1.086, 1.586] < (1.586, 2.086] <
                                     (2.086, 2.586] ... (8.086, 8.586] < (8.586, 9.086] <
                                     (9.086, 9.586] < (9.586, 10.086]]
0     False
1     False
2     False
3     False
4      True
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24    False
25    False
26     True
27    False
28     True
29    False
Name: bins, dtype: bool

集...

print(df[df.bins == pd.Interval(5.1, 5.2)])

              bins    values
4   (5.086, 5.586]  5.132422
26  (5.086, 5.586]  5.309666
28  (5.086, 5.586]  5.574920

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM