匯總數據框每一行的列，並在多級索引熊貓數據框中添加新列

Question

我正在嘗試對多級熊貓數據框中的每一行的列求和，並將計算的值添加到新列上。

我使用的數據集是從seaborn庫“ 航班 ”數據集


import pandas as pd
import seaborn

# Load dataset from seaborn library
flights = seaborn.load_dataset('flights')

# !!!EDIT - I added this line because it was missing!!!
# Set index for the loaded dataframe
flights_indexed = flights.set_index(['year','month'])

# Unstack the dataframe and create columns for each months
flights_unstacked = flights_indexed.unstack()

# Compute sum of each row
sum_row = flights_unstacked.sum(axis=1)
sum_row_reshape = sum_row.values.reshape(12,1)


### Put the sum of each row in a new column ###
flights_unstacked['passengers','total'] = sum_row

# alternatively,
flights_unstacked['passengers','total'] = sum_row_reshape

以上兩種方法返回：

TypeError：無法將項目插入到尚不存在的類別索引中

有人可以幫忙嗎？

Answer 1

問題源於您的月份列的dtype為“類別”。 您應該將其轉換為“ str”類型。 然后您的代碼應該可以正常工作：

import seaborn
df = seaborn.load_dataset('flights')
print(df.dtypes)
df['month'] = df['month'].astype(str)
df.set_index(['year', 'month'], inplace=True)
months = df.index.unique(1)
df_unstacked = df.unstack()
# order of months is lost when using unstack, hence reindex
df_unstacked = df_unstacked.reindex(months, axis=1, level=1)
df_unstacked['passengers', 'sum'] = df_unstacked.sum(1)

Answer 2

# Unstack the dataframe and create columns for each months
flights_unstacked = flights_indexed.unstack()

上面的行只會創建1列，包含432行。 您是否要創建2列？ 數據也有3列，分別是年，月和乘客。 當年份和乘客具有整數值時，月份則以字符串形式表示月份。 取消堆疊將導致您在flights_unstacked中flights_unstacked月的時間，因此您可能必須將其丟棄。 您真的有必要拆開數據集嗎？ 此外，如果您可以發布所需的結果，則有助於更好地理解和回答您的問題

匯總數據框每一行的列，並在多級索引熊貓數據框中添加新列

問題描述

2 個解決方案

解決方案1
2 已采納 2019-06-14 16:14:23

解決方案2
0 2019-06-14 15:50:09

匯總數據框每一行的列，並在多級索引熊貓數據框中添加新列

問題描述

2 個解決方案

解決方案1 2 已采納 2019-06-14 16:14:23

解決方案2 0 2019-06-14 15:50:09

解決方案1
2 已采納 2019-06-14 16:14:23

解決方案2
0 2019-06-14 15:50:09