子集pandas DataFrame基于bin

Question

我正在尝试基于分类类别对pandas DataFrame进行子集化。 （我知道你可以根据自己的价值观子集，这是一个不同的问题，其实我需要斌数据！的只是一种表象）我觉得我失去了一些东西有关的子集，但无法找到它什么在文档中。 这是一个例子：

import numpy as np
import pandas as pd

np.random.seed(9876)

# Generating random data for binning.
bin_step = 0.5
random_data = np.random.uniform(low = 0, high = 10, size = 30)

# Generating bin ranges
bin_ranges = np.arange(start = random_data.min(), 
                           stop = random_data.max() + random_data.max()*0.1, 
                           step = bin_step)

# Cutting the random data into predefined bins.
bins = pd.cut(random_data.tolist(), 
              bin_ranges,
              right = True,
              include_lowest = True)

# Aggregating into a pandas DataFrame
random_data_pd = pd.Series(random_data.tolist(), name = 'values')
bins_transformed = pd.Series(bins, name = 'bins')

df = pd.concat([bins_transformed, random_data_pd], axis = 1)

例如(5.086, 5.586]箱进行子集化时，它返回所有False 。为什么这不是子集？

df.bins == '(5.086, 5.586]' #returns all false.

Answer 1

如果我理解正确，原因是你使用==到不同的类型， pd.Interval vs str 。 请检查我的例子。

print(type(df.bins[0]))

<class 'pandas._libs.interval.Interval'>

print(df.bins)
print(df.bins == pd.Interval(5.1, 5.2))

0     (1.586, 2.086]
1     (6.086, 6.586]
2     (8.586, 9.086]
3     (7.586, 8.086]
4     (5.086, 5.586]
5     (0.585, 1.086]
6     (4.586, 5.086]
7     (1.086, 1.586]
8     (9.086, 9.586]
9     (4.586, 5.086]
10    (1.586, 2.086]
11    (1.086, 1.586]
12    (2.586, 3.086]
13    (2.586, 3.086]
14    (1.086, 1.586]
15    (8.086, 8.586]
16    (7.086, 7.586]
17    (6.586, 7.086]
18    (8.586, 9.086]
19    (7.586, 8.086]
20    (7.586, 8.086]
21    (0.585, 1.086]
22    (4.586, 5.086]
23    (9.086, 9.586]
24    (8.086, 8.586]
25    (6.586, 7.086]
26    (5.086, 5.586]
27    (6.586, 7.086]
28    (5.086, 5.586]
29    (9.086, 9.586]
Name: bins, dtype: category
Categories (19, interval[float64]): [(0.585, 1.086] < (1.086, 1.586] < (1.586, 2.086] <
                                     (2.086, 2.586] ... (8.086, 8.586] < (8.586, 9.086] <
                                     (9.086, 9.586] < (9.586, 10.086]]
0     False
1     False
2     False
3     False
4      True
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17    False
18    False
19    False
20    False
21    False
22    False
23    False
24    False
25    False
26     True
27    False
28     True
29    False
Name: bins, dtype: bool

集...

print(df[df.bins == pd.Interval(5.1, 5.2)])

              bins    values
4   (5.086, 5.586]  5.132422
26  (5.086, 5.586]  5.309666
28  (5.086, 5.586]  5.574920

子集pandas DataFrame基于bin

问题描述

1 个解决方案

解决方案1
1 2017-08-17 05:42:26

子集pandas DataFrame基于bin

问题描述

1 个解决方案

解决方案1 1 2017-08-17 05:42:26

解决方案1
1 2017-08-17 05:42:26