简体   繁体   English

Pandas:如何在现有 DataFrame 的列上设置索引?

[英]Pandas: How do I set index on the columns of an existing DataFrame?

I am quite new to pandas.我对熊猫很陌生。 Basically, I have 10 different type of data for different firms in 10 dfs.基本上,我在 10 个 dfs 中有不同公司的 10 种不同类型的数据。 Eg Total Assets, AUM, etc.例如总资产、AUM 等。
For each type of data, there could have high or low importance: H, or L.对于每种类型的数据,可能有高或低的重要性:H 或 L。
For each type of data, there could have 3 categories: Cat1, Cat2, Cat3.对于每种类型的数据,可能有 3 个类别:Cat1、Cat2、Cat3。

For H importance, I need to analyse the data by the 3 categories.对于 H 重要性,我需要按 3 个类别分析数据。 Same for L importance. L重要性相同。

I am thinking of adding a mulit-index for each column of data after merging the 10 dfs.我正在考虑在合并 10 个 dfs 后为每列数据添加一个多索引。 Is that possible?那可能吗?

Current State当前状态


**df_1**

      |Total Assets|
Firm 1| 100        |
Firm 2| 200        |
Firm 3| 300        |

**df_2**

      |AUMS    |
Firm 1| 300    |
Firm 2| 3400   |
Firm 3| 800    |
Firm 4| 800    |

and so on until df_10. Also the firms for all the df could differ.


Desired Output期望输出

**Merged_df**

Importance| L         | H    |
Category | Cat1       | Cat2 |
         |Total Assets| AUMs |
Firm 1   | 100        | 300  |
Firm 2   | 200        | 3400 |
Firm 3   | 300        | 800  |
Firm 4   | NaN        | 800  |


Next, I need to do a Groupby "Importance" and "Category".接下来,我需要对“重要性”和“类别”进行分组。 Any other solution besides Multi-indexing is welcome.欢迎使用除多索引之外的任何其他解决方案。 Thank you!谢谢!

We canconcat on axis=1 with MultiIndex keys:我们可以concataxis=1MultiIndex键:

dfs = [df1, df2]
merged_df = pd.concat(
    dfs, axis=1,
    keys=pd.MultiIndex.from_arrays([
        ['L', 'H'],       # Top Level Keys
        ['Cat1', 'Cat2']  # Second Level Keys
    ], names=['Importance', 'Category'])
)

merged_df : merged_df

Importance            L     H
Category           Cat1  Cat2
           Total Assets  AUMS
Firm 1            100.0   300
Firm 2            200.0  3400
Firm 3            300.0   800
Firm 4              NaN   800

CategoricalDtype can be used to establish ordering: CategoricalDtype可用于建立排序:

dfs = [df1, df2]
# Specify Categorical Types
# These lists should contain _only_ the unique categories
# in the desired order
importance_type = pd.CategoricalDtype(categories=['H', 'L'], ordered=True)
category_type = pd.CategoricalDtype(categories=['Cat1', 'Cat2'], ordered=True)


# Keys should contain the _complete_ list of _all_ columns
merged_df = pd.concat(
    dfs, axis=1,
    keys=pd.MultiIndex.from_arrays([
        pd.Series(['L', 'H'],            # Top Level Keys
                  dtype=importance_type),
        pd.Series(['Cat1', 'Cat2'],      # Second Level Keys
                  dtype=category_type)
    ], names=['Importance', 'Category'])
)

Then sort_index can be used and it will work as expected.然后可以使用sort_index并且它会按预期工作。 H before L , etc. HL之前,等等。

# Sorting Now Works As Expected
merged_df = merged_df.sort_index(level=[0, 1], axis=1)

merged_df : merged_df

Importance     H            L
Category    Cat2         Cat1
            AUMS Total Assets
Firm 1       300        100.0
Firm 2      3400        200.0
Firm 3       800        300.0
Firm 4       800          NaN

DataFrames:数据帧:

import pandas as pd

df1 = pd.DataFrame({
    'Total Assets': {'Firm 1': 100, 'Firm 2': 200, 'Firm 3': 300}
})

df2 = pd.DataFrame({
    'AUMS': {'Firm 1': 300, 'Firm 2': 3400, 'Firm 3': 800, 'Firm 4': 800}
})

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何设置一个列表作为现有pandas dataframe的索引? - How to set a list as the index of an existing pandas dataframe? 如何在具有现有列的数据框中设置多级索引 - How to set multilevel index in a dataframe with existing columns 如何将 pandas Dataframe 的索引设置为列长度的索引? - How to set the index of a pandas Dataframe to that of the length of the Columns? 如何正确地将日期时间设置为 Pandas dataframe 的索引? - How do I properly set the Datetime as an index for a Pandas dataframe? 我如何将数据和索引设置到 pandas dataframe - how do i set data and index into pandas dataframe 如何根据现有列过滤 pandas dataframe 中的行? - How do you filter rows in a pandas dataframe conditional on columns existing? 如何确定在Pandas DataFrame中将哪些列设置为索引? - How does one determine which columns to set as an index in a Pandas DataFrame? pandas.HDFStore:如何修改现有商店的“data_columns”? 我想为不在数据列中的列添加索引 - pandas.HDFStore: How do I modify “data_columns” for an existing store? I'd like to add an index to a column not in data columns 如何为 pandas dataframe 中的索引和列的“交集”分配名称? - How can I assign a name to the 'intersection' of index and columns in a pandas dataframe? 我如何使用布尔值索引来检索熊猫DataFrame的列 - How could I use boolean index to retrieve columns of a pandas DataFrame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM