[英]Using the groupby function in pandas, how can I create new dataframe columns that hold sums for each groupby "level"
[英]How can i groupby 2 columns in pandas and show count for each one?
例如我的 df 是:
movie_name gender
"abc" f
"abc" m
"bbb" m
我想要一个新的 df 是:
movie_name male_count female_count diff
"abc" 1 1 0
"bbb" 1 0 1
我怎样才能实现这个目标?
另一种解决方案,使用.pivot_table()
:
df_out = (
df.pivot_table(index="movie_name", columns="gender", aggfunc="size")
.fillna(0)
.astype(int)
.rename(columns={"m": "male_count", "f": "female_count"})
)
df_out["diff"] = df_out["male_count"] - df_out["female_count"]
print(df_out)
印刷:
gender female_count male_count diff
movie_name
"abc" 1 1 0
"bbb" 0 1 1
将groupby
与unstack()
一起使用
df1 = df.groupby(['movie_name','gender'])['gender']\
.count().unstack(1,fill_value=0)\
.rename(columns={'f' : 'female', 'm' : 'male'})\
.add_suffix('_count')
然后使用.map
作为 diff 列,这可能是一种更优雅的方式。
df1['diff'] = df1.index.map(df1.stack()\
.reset_index(1,drop=True)\
.groupby(level=0).diff().dropna())
gender female_count male_count diff
movie_name
abc 1 1 0.0
bbb 0 1 1.0
这是一个crosstab
表:
out = pd.crosstab(index=df["movie_name"], columns=df["gender"])
out["diff"] = out["m"] - out["f"]
print(out)
gender f m diff
movie_name
abc 1 1 0
bbb 0 1 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.