Pandas pivot_table 和分类变量作为值

Question

I am puzzled by how pivot_table works with categorical variables used as values.我对pivot_table如何与用作值的分类变量一起工作感到困惑。

I did search the web for pandas pivot_table categorical variables and I do find some information, but nothing that really explains to my why I see what I see.我确实在网上搜索了 pandas pivot_table 分类变量，我确实找到了一些信息，但没有什么能真正解释我为什么看到我所看到的。

Test dataframe:测试数据框：

test_df = pd.DataFrame.from_dict({'val': ['pass','pass','fail','pass'], "col_a": ['a','b','a','b'], "col_b": ['x','x','y','y']})
test_df
    val col_a   col_b
0   pass    a   x
1   pass    b   x
2   fail    a   y
3   pass    b   y

Then I proceed to reshape it.然后我继续重塑它。 I come from R/data.tables where this would be a cast.我来自 R/data.tables 这将是一个演员。

test_df.pivot_table(index = "col_a", columns = "col_b", values = 'val')

and I get this:我明白了：

/tmp/ipykernel_153608/3910840210.py:1: FutureWarning:

Dropping invalid columns in DataFrameGroupBy.mean is deprecated. In a future version, a TypeError will be raised. Before calling .mean, select only columns which should be valid for the function.

col_b
col_a
a
b

Empty result, but with indices.空结果，但带有索引。 After doing a million tests (on my real object, such as testing if there were duplicated values, NAs, etc), this seems to work:在进行了一百万次测试（在我的真实对象上，例如测试是否存在重复值、NA 等）之后，这似乎有效：

test_df.pivot_table(index = "col_a", columns = "col_b", values = 'val', aggfunc=lambda x: x)
col_b   x   y
col_a       
a   pass    fail
b   pass    pass

which is what I want.这就是我想要的。 Besides the "dude, if it works, take it and be happy", does anyone know why I have to put the aggregation function?除了“老兄，如果它有效，请接受它并快乐”，有谁知道我为什么要放置聚合功能？

Answer 1

A pivot_table exists in order to aggregate data.存在一个pivot_table以聚合数据。 If you just want to pivot the data and not aggregate it, then use pivot :如果您只想pivot数据而不是聚合它，请使用pivot ：

test_df.pivot(index='col_a', columns='col_b')

        val      
col_b     x     y
col_a            
a      pass  fail
b      pass  pass

In your actual data, IF there are duplicates, then you would need to aggregate data and use pivot_table with first() or something.在您的实际数据中，如果有重复，那么您将需要聚合数据并将pivot_table与first()或其他东西一起使用。

Pandas pivot_table 和分类变量作为值

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-05-11 20:04:04

Pandas pivot_table 和分类变量作为值

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-05-11 20:04:04

解决方案1
2 已采纳 2022-05-11 20:04:04