[英]Python 3: Transpose columns of Pandas Data Frame / "melt" data frame
I have a Pandas Data Frame like this:我有一个像这样的 Pandas 数据框:
uid category count
0 1 comedy 5
1 1 drama 7
2 2 drama 4
3 3 other 10
4 3 comedy 6
Except there are dozens of categories, millions of rows, and a few dozen other columns.除了有几十个类别、几百万行和几十个其他列。
I want to turn that into something like this:我想把它变成这样的东西:
id cat_comedy cat_drama cat_other
0 1 5 7 0
1 2 0 4 0
2 3 6 0 10
I have no idea how to do this and am looking for tips/hints/full solutions.我不知道如何做到这一点,正在寻找提示/提示/完整的解决方案。 I don't really care about the row index.我真的不关心行索引。
Thanks.谢谢。
I think this is what you're after (the operation is called a 'pivot'):我认为这就是您所追求的(该操作称为“枢轴”):
from pandas import DataFrame
df = DataFrame([
{'id': 1, 'category': 'comedy', 'count': 5},
{'id': 1, 'category': 'drama', 'count': 7},
{'id': 2, 'category': 'drama', 'count': 4},
{'id': 3, 'category': 'other', 'count': 10},
{'id': 3, 'category': 'comedy', 'count': 6}
]).set_index('id')
result = df.pivot(columns=['category'])
print(result)
Result:结果:
count
category comedy drama other
id
1 5.0 7.0 NaN
2 NaN 4.0 NaN
3 6.0 NaN 10.0
In response to your comment, if you don't want the id
as an index for the df
, you can tell the operation to use it as the index for the pivot.针对您的评论,如果您不希望id
作为df
的索引,您可以告诉操作将其用作 pivot 的索引。 You'll need pivot_table
instead of pivot
to achieve this, as it allows can handle duplicate values for one pivoted index/column pair.您需要pivot_table
而不是pivot
来实现这一点,因为它允许处理一个旋转索引/列对的重复值。
And replacing the NaN
with zeroes is also an option:用零替换NaN
也是一种选择:
df = DataFrame([
{'uid': 1, 'category': 'comedy', 'count': 5},
{'uid': 1, 'category': 'drama', 'count': 7},
{'uid': 2, 'category': 'drama', 'count': 4},
{'uid': 3, 'category': 'other', 'count': 10},
{'uid': 3, 'category': 'comedy', 'count': 6}
])
result = df.pivot_table(columns=['category'], index='uid', fill_value=0)
print(result)
However, note that the resulting table still has uid
as its index.但是,请注意,结果表仍将uid
作为其索引。 If that's not what you want, you can revert the resulting columns back to a normal one:如果这不是您想要的,您可以将结果列恢复为正常列:
result = df.pivot_table(columns=['category'], index='uid', fill_value=0).reset_index()
The final result:最终结果:
uid count
category comedy drama other
0 1 5 7 0
1 2 0 4 0
2 3 6 0 10
The original answer from @Grismar (upvoted since he got it in first) is really close but doesn't quite work. @Grismar 的原始答案(因为他首先得到它而被赞成)非常接近,但不太奏效。 Don't reset your index before the pivot call, and then do the following:不要在 pivot 调用之前重置索引,然后执行以下操作:
df2 = df.pivot_table(columns='category', index='uid', aggfunc=sum)
df2 = df2.fillna(0).reset_index()
df2 is now the dataframe you want. df2 现在是您想要的 dataframe。 The fillna
function replaces all the NaNs
with 0s
. fillna
function 将所有NaNs
替换为0s
。
Complete solution using pivot_table
:使用pivot_table
完整解决方案:
import pandas as pd
df = pd.DataFrame([
{'uid': 1, 'category': 'comedy', 'count': 5},
{'uid': 1, 'category': 'drama', 'count': 7},
{'uid': 2, 'category': 'drama', 'count': 4},
{'uid': 3, 'category': 'other', 'count': 10},
{'uid': 3, 'category': 'comedy', 'count': 6}
])
df.pivot_table(
columns='category',
index='uid',
aggfunc=sum,
fill_value=0
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.