汇总数据框每一行的列，并在多级索引熊猫数据框中添加新列

Question

我正在尝试对多级熊猫数据框中的每一行的列求和，并将计算的值添加到新列上。

我使用的数据集是从seaborn库“ 航班 ”数据集


import pandas as pd
import seaborn

# Load dataset from seaborn library
flights = seaborn.load_dataset('flights')

# !!!EDIT - I added this line because it was missing!!!
# Set index for the loaded dataframe
flights_indexed = flights.set_index(['year','month'])

# Unstack the dataframe and create columns for each months
flights_unstacked = flights_indexed.unstack()

# Compute sum of each row
sum_row = flights_unstacked.sum(axis=1)
sum_row_reshape = sum_row.values.reshape(12,1)


### Put the sum of each row in a new column ###
flights_unstacked['passengers','total'] = sum_row

# alternatively,
flights_unstacked['passengers','total'] = sum_row_reshape

以上两种方法返回：

TypeError：无法将项目插入到尚不存在的类别索引中

有人可以帮忙吗？

Answer 1

问题源于您的月份列的dtype为“类别”。 您应该将其转换为“ str”类型。 然后您的代码应该可以正常工作：

import seaborn
df = seaborn.load_dataset('flights')
print(df.dtypes)
df['month'] = df['month'].astype(str)
df.set_index(['year', 'month'], inplace=True)
months = df.index.unique(1)
df_unstacked = df.unstack()
# order of months is lost when using unstack, hence reindex
df_unstacked = df_unstacked.reindex(months, axis=1, level=1)
df_unstacked['passengers', 'sum'] = df_unstacked.sum(1)

Answer 2

# Unstack the dataframe and create columns for each months
flights_unstacked = flights_indexed.unstack()

上面的行只会创建1列，包含432行。 您是否要创建2列？ 数据也有3列，分别是年，月和乘客。 当年份和乘客具有整数值时，月份则以字符串形式表示月份。 取消堆叠将导致您在flights_unstacked中flights_unstacked月的时间，因此您可能必须将其丢弃。 您真的有必要拆开数据集吗？ 此外，如果您可以发布所需的结果，则有助于更好地理解和回答您的问题

汇总数据框每一行的列，并在多级索引熊猫数据框中添加新列

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-06-14 16:14:23

解决方案2
0 2019-06-14 15:50:09

汇总数据框每一行的列，并在多级索引熊猫数据框中添加新列

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-06-14 16:14:23

解决方案2 0 2019-06-14 15:50:09

解决方案1
2 已采纳 2019-06-14 16:14:23

解决方案2
0 2019-06-14 15:50:09