簡體   English   中英

將熊貓數據透視表與間隔列一起使用會導致TypeError

[英]Using pandas pivot_table with Interval columns results in TypeError

      cat1             cat2                       col_a             col_b
0    (34.0, 38.0]    (15.9, 47.0]             29               10
1    (34.0, 38.0]    (15.9, 47.0]             37               22
2    (28.0, 34.0]    (47.0, 56.0]              3               13
3    (34.0, 38.0]    (47.0, 56.0]             15                7
4    (28.0, 34.0]    (56.0, 67.0]             42               20
5    (28.0, 34.0]    (47.0, 56.0]             31               23
6    (28.0, 34.0]    (56.0, 67.0]             26               17
7    (28.0, 34.0]    (56.0, 67.0]              7                1
8    (28.0, 34.0]    (56.0, 67.0]             36               19
9    (19.0, 28.0]    (56.0, 67.0]              5                7
10   (19.0, 28.0]    (56.0, 67.0]             21                5
11   (28.0, 34.0]    (67.0, 84.0]             37               13

在上面的數據框中,我想使用熊貓執行此數據透視表操作

pd.pivot_table(df, index='cat1', columns='cat2', values='col_a')

但是我得到了錯誤:

TypeError: Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'

col_acol_b均為int32類型,而cat1cat2均為類別。 我如何擺脫這個錯誤?

這是一個與樞軸間隔相關的錯誤(請參見GH25814 ),並將針對v0.25進行修復。 另請參見使用crosstab相關問題: CategoricalDType列上的熊貓交叉表會引發TypeError

同時,這里有一些選擇。 要進行匯總,您將必須使用pivot_table並將類別cols轉換為字符串,然后再進行pivot_table

df2 = df.assign(cat1=df['cat1'].astype(str), cat2=df['cat2'].astype(str))
# to aggregate by taking the mean of col_a
df2.pivot_table(index='cat1', columns='cat2', values='col_a', aggfunc='mean')

需要注意的是,您失去了索引和列為間隔的好處。

另一個選擇是繞開分類代碼,然后重新分配類別:

df2 = df.assign(cat1=df['cat1'].cat.codes, cat2=df['cat2'].cat.codes)
pivot = df2.pivot_table(
    index='cat1', columns='cat2', values='col_a', aggfunc='mean')

pivot.index = df['cat1'].cat.categories
pivot.columns = df['cat2'].cat.categories

該分配將起作用,因為pivot_tablepivot_table對間隔進行排序。


最少的代碼樣本

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame({
    'cat1': np.random.choice(100, 10), 
    'cat2': np.random.choice(100, 10), 
    'col_a': np.random.randint(1, 50, 10)})

df['cat1'] = pd.cut(df['cat1'], bins=np.arange(0, 101, 10))
df['cat2'] = pd.cut(df['cat2'], bins=np.arange(0, 101, 10))

df
          A         B   C
0  (40, 50]  (60, 70]  18
1  (40, 50]  (80, 90]  38
2  (60, 70]  (80, 90]  26
3  (60, 70]  (10, 20]  14
4  (60, 70]  (50, 60]   9
5   (0, 10]  (60, 70]  10
6  (80, 90]  (30, 40]  21
7  (20, 30]  (80, 90]  17
8  (30, 40]  (40, 50]   6
9  (80, 90]  (80, 90]  16

(df.assign(cat1=df['cat1'].astype(str), cat2=df['cat2'].astype(str))
   .pivot_table(index='cat1', columns='cat2', values='col_a', aggfunc='mean'))

cat2      (10, 20]  (30, 40]  (40, 50]  (50, 60]  (60, 70]  (80, 90]
cat1                                                                
(0, 10]        NaN       NaN       NaN       NaN      10.0       NaN
(20, 30]       NaN       NaN       NaN       NaN       NaN      17.0
(30, 40]       NaN       NaN       6.0       NaN       NaN       NaN
(40, 50]       NaN       NaN       NaN       NaN      18.0      38.0
(60, 70]      14.0       NaN       NaN       9.0       NaN      26.0
(80, 90]       NaN      21.0       NaN       NaN       NaN      16.0

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM