简体   繁体   English

如何从具有包含列表的列的数据框创建数据透视表?

[英]How do I create a pivot table from a dataframe that has a column contains lists?

I have a dataframe and it looks like,我有一个数据框,它看起来像,

import pandas as pd

data = [
  {
    "userId": 1,
    "binary_vote": 0,
    "genres": [
      "Adventure",
      "Comedy"
    ]
  },
  {
    "userId": 1,
    "binary_vote": 1,
    "genres": [
      "Adventure",
      "Drama"
    ]
  },
  {
    "userId": 2,
    "binary_vote": 0,
    "genres": [
      "Comedy",
      "Drama"
    ]
  },
  {
    "userId": 2,
    "binary_vote": 1,
    "genres": [
      "Adventure",
      "Drama"
    ]
  },
]

df = pd.DataFrame(data)
print(df)

   userId  binary_vote               genres
0  1       0            [Adventure, Comedy]
1  1       1            [Adventure, Drama]
2  2       0            [Comedy, Drama]
3  2       1            [Adventure, Drama]

I want to create column from binary_vote .我想从binary_vote创建列。 And here is the expected output,这是预期的输出,

   userId        binary_vote_0       binary_vote_1
0  1       [Adventure, Comedy]  [Adventure, Drama]
1  2       [Comedy, Drama]      [Adventure, Drama]

I tried something like this, but I get an error,我试过这样的事情,但我得到一个错误,

pd.pivot_table(df, columns=['binary_vote'], values='genres')

Here is error,这是错误,

DataError: No numeric types to aggregate DataError:没有要聚合的数字类型

Any idea?任何想法? Thanks in advance.提前致谢。

We have to create our own aggfunc , in this case it's a simple one.我们必须创建我们自己的aggfunc ,在这种情况下它是一个简单的。

The reason it failed is because it tried to take the mean as it's the default aggregation function.它失败的原因是因为它试图取mean因为它是默认聚合函数。 Obviously, this will fail on your list.显然,这将在您的列表中失败。

piv = (
    df.pivot_table(index='userId', columns='binary_vote', values='genres', aggfunc=lambda x: x)
      .add_prefix('binary_vote_')
      .reset_index()
      .rename_axis(None, axis=1)
)
print(piv)
   userId        binary_vote_0       binary_vote_1
0       1  [Adventure, Comedy]  [Adventure, Drama]
1       2      [Comedy, Drama]  [Adventure, Drama]

Another way using set_index() and unstack() :用另一种方式set_index()unstack()

m=(df.set_index(['userId','binary_vote']).unstack()
     .add_prefix('binary_vote_').droplevel(level=0,axis=1))
m.reset_index().rename_axis(None,axis=1)

   userId        binary_vote_0       binary_vote_1
0       1  [Adventure, Comedy]  [Adventure, Drama]
1       2      [Comedy, Drama]  [Adventure, Drama]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pandas数据框中的列具有作为值的列表。 如何创建此列的版本,但列表中只有第一个值? - Column in pandas dataframe has lists as values. How do I create a version of this column but with only the first value in the list? Python:如何 pivot 包含列表的 dataframe? - Python: how to pivot a dataframe that contains lists? 如何从 dataframe 中包含列表的列创建新列 - How can I create a new columns from a column with lists in a dataframe 如何使用基于另一个DataFrame的列将一个DataFrame列转移到真值表? - How do I pivot one DataFrame column to a truth table with columns based on another DataFrame? 如何旋转pandas DataFrame列以创建二进制“值表”? - How to pivot pandas DataFrame column to create binary “value table”? Pivot_table 来自列值中的列表 - Pivot_table from lists in a column value 如何为具有 1 个或多个列的 Pandas pivot 表保留或显示值的列名? - How do I keep or show a value's column name for a Pandas pivot table that has 1 or more columns? 如何为包含列表的列过滤数据框 - How to filter dataframe for column with lists contains value 如何创建一个 For 循环来检查一列是否在 Pandas DataFrame 中包含重复项 - How do I Create a For Loop that checks if a column contains duplicates in a Pandas DataFrame 如何将pandas数据透视表转换为数据帧 - How do I convert a pandas pivot table to a dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM