![](/img/trans.png)
[英]Column in pandas dataframe has lists as values. How do I create a version of this column but with only the first value in the list?
[英]How do I create a pivot table from a dataframe that has a column contains lists?
我有一个数据框,它看起来像,
import pandas as pd
data = [
{
"userId": 1,
"binary_vote": 0,
"genres": [
"Adventure",
"Comedy"
]
},
{
"userId": 1,
"binary_vote": 1,
"genres": [
"Adventure",
"Drama"
]
},
{
"userId": 2,
"binary_vote": 0,
"genres": [
"Comedy",
"Drama"
]
},
{
"userId": 2,
"binary_vote": 1,
"genres": [
"Adventure",
"Drama"
]
},
]
df = pd.DataFrame(data)
print(df)
userId binary_vote genres
0 1 0 [Adventure, Comedy]
1 1 1 [Adventure, Drama]
2 2 0 [Comedy, Drama]
3 2 1 [Adventure, Drama]
我想从binary_vote
创建列。 这是预期的输出,
userId binary_vote_0 binary_vote_1
0 1 [Adventure, Comedy] [Adventure, Drama]
1 2 [Comedy, Drama] [Adventure, Drama]
我试过这样的事情,但我得到一个错误,
pd.pivot_table(df, columns=['binary_vote'], values='genres')
这是错误,
DataError:没有要聚合的数字类型
任何想法? 提前致谢。
我们必须创建我们自己的aggfunc
,在这种情况下它是一个简单的。
它失败的原因是因为它试图取mean
因为它是默认聚合函数。 显然,这将在您的列表中失败。
piv = (
df.pivot_table(index='userId', columns='binary_vote', values='genres', aggfunc=lambda x: x)
.add_prefix('binary_vote_')
.reset_index()
.rename_axis(None, axis=1)
)
print(piv)
userId binary_vote_0 binary_vote_1
0 1 [Adventure, Comedy] [Adventure, Drama]
1 2 [Comedy, Drama] [Adventure, Drama]
用另一种方式set_index()
和unstack()
m=(df.set_index(['userId','binary_vote']).unstack()
.add_prefix('binary_vote_').droplevel(level=0,axis=1))
m.reset_index().rename_axis(None,axis=1)
userId binary_vote_0 binary_vote_1
0 1 [Adventure, Comedy] [Adventure, Drama]
1 2 [Comedy, Drama] [Adventure, Drama]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.