Pandas：如何在現有 DataFrame 的列上設置索引？

Question

我對熊貓很陌生。 基本上，我在 10 個 dfs 中有不同公司的 10 種不同類型的數據。 例如總資產、AUM 等。
對於每種類型的數據，可能有高或低的重要性：H 或 L。
對於每種類型的數據，可能有 3 個類別：Cat1、Cat2、Cat3。

對於 H 重要性，我需要按 3 個類別分析數據。 L重要性相同。

我正在考慮在合並 10 個 dfs 后為每列數據添加一個多索引。 那可能嗎？

當前狀態


**df_1**

      |Total Assets|
Firm 1| 100        |
Firm 2| 200        |
Firm 3| 300        |

**df_2**

      |AUMS    |
Firm 1| 300    |
Firm 2| 3400   |
Firm 3| 800    |
Firm 4| 800    |

and so on until df_10. Also the firms for all the df could differ.

期望輸出

**Merged_df**

Importance| L         | H    |
Category | Cat1       | Cat2 |
         |Total Assets| AUMs |
Firm 1   | 100        | 300  |
Firm 2   | 200        | 3400 |
Firm 3   | 300        | 800  |
Firm 4   | NaN        | 800  |

接下來，我需要對“重要性”和“類別”進行分組。 歡迎使用除多索引之外的任何其他解決方案。 謝謝！

Answer 1

我們可以concat上axis=1與MultiIndex鍵：

dfs = [df1, df2]
merged_df = pd.concat(
    dfs, axis=1,
    keys=pd.MultiIndex.from_arrays([
        ['L', 'H'],       # Top Level Keys
        ['Cat1', 'Cat2']  # Second Level Keys
    ], names=['Importance', 'Category'])
)

merged_df ：

Importance            L     H
Category           Cat1  Cat2
           Total Assets  AUMS
Firm 1            100.0   300
Firm 2            200.0  3400
Firm 3            300.0   800
Firm 4              NaN   800

CategoricalDtype可用於建立排序：

dfs = [df1, df2]
# Specify Categorical Types
# These lists should contain _only_ the unique categories
# in the desired order
importance_type = pd.CategoricalDtype(categories=['H', 'L'], ordered=True)
category_type = pd.CategoricalDtype(categories=['Cat1', 'Cat2'], ordered=True)


# Keys should contain the _complete_ list of _all_ columns
merged_df = pd.concat(
    dfs, axis=1,
    keys=pd.MultiIndex.from_arrays([
        pd.Series(['L', 'H'],            # Top Level Keys
                  dtype=importance_type),
        pd.Series(['Cat1', 'Cat2'],      # Second Level Keys
                  dtype=category_type)
    ], names=['Importance', 'Category'])
)

然后可以使用sort_index並且它會按預期工作。 H在L之前，等等。

# Sorting Now Works As Expected
merged_df = merged_df.sort_index(level=[0, 1], axis=1)

merged_df ：

Importance     H            L
Category    Cat2         Cat1
            AUMS Total Assets
Firm 1       300        100.0
Firm 2      3400        200.0
Firm 3       800        300.0
Firm 4       800          NaN

數據幀：

import pandas as pd

df1 = pd.DataFrame({
    'Total Assets': {'Firm 1': 100, 'Firm 2': 200, 'Firm 3': 300}
})

df2 = pd.DataFrame({
    'AUMS': {'Firm 1': 300, 'Firm 2': 3400, 'Firm 3': 800, 'Firm 4': 800}
})

Pandas：如何在現有 DataFrame 的列上設置索引？

問題描述

1 個解決方案

解決方案1
1 已采納 2021-07-26 01:54:04

Pandas：如何在現有 DataFrame 的列上設置索引？

問題描述

1 個解決方案

解決方案1 1 已采納 2021-07-26 01:54:04

解決方案1
1 已采納 2021-07-26 01:54:04