[英]Pandas: How do I set index on the columns of an existing DataFrame?
I am quite new to pandas.我对熊猫很陌生。 Basically, I have 10 different type of data for different firms in 10 dfs.
基本上,我在 10 个 dfs 中有不同公司的 10 种不同类型的数据。 Eg Total Assets, AUM, etc.
例如总资产、AUM 等。
For each type of data, there could have high or low importance: H, or L.对于每种类型的数据,可能有高或低的重要性:H 或 L。
For each type of data, there could have 3 categories: Cat1, Cat2, Cat3.对于每种类型的数据,可能有 3 个类别:Cat1、Cat2、Cat3。
For H importance, I need to analyse the data by the 3 categories.对于 H 重要性,我需要按 3 个类别分析数据。 Same for L importance.
L重要性相同。
I am thinking of adding a mulit-index for each column of data after merging the 10 dfs.我正在考虑在合并 10 个 dfs 后为每列数据添加一个多索引。 Is that possible?
那可能吗?
Current State当前状态
**df_1**
|Total Assets|
Firm 1| 100 |
Firm 2| 200 |
Firm 3| 300 |
**df_2**
|AUMS |
Firm 1| 300 |
Firm 2| 3400 |
Firm 3| 800 |
Firm 4| 800 |
and so on until df_10. Also the firms for all the df could differ.
Desired Output期望输出
**Merged_df**
Importance| L | H |
Category | Cat1 | Cat2 |
|Total Assets| AUMs |
Firm 1 | 100 | 300 |
Firm 2 | 200 | 3400 |
Firm 3 | 300 | 800 |
Firm 4 | NaN | 800 |
Next, I need to do a Groupby "Importance" and "Category".接下来,我需要对“重要性”和“类别”进行分组。 Any other solution besides Multi-indexing is welcome.
欢迎使用除多索引之外的任何其他解决方案。 Thank you!
谢谢!
We canconcat
on axis=1
with MultiIndex
keys:我们可以
concat
上axis=1
与MultiIndex
键:
dfs = [df1, df2]
merged_df = pd.concat(
dfs, axis=1,
keys=pd.MultiIndex.from_arrays([
['L', 'H'], # Top Level Keys
['Cat1', 'Cat2'] # Second Level Keys
], names=['Importance', 'Category'])
)
merged_df
: merged_df
:
Importance L H
Category Cat1 Cat2
Total Assets AUMS
Firm 1 100.0 300
Firm 2 200.0 3400
Firm 3 300.0 800
Firm 4 NaN 800
CategoricalDtype
can be used to establish ordering: CategoricalDtype
可用于建立排序:
dfs = [df1, df2]
# Specify Categorical Types
# These lists should contain _only_ the unique categories
# in the desired order
importance_type = pd.CategoricalDtype(categories=['H', 'L'], ordered=True)
category_type = pd.CategoricalDtype(categories=['Cat1', 'Cat2'], ordered=True)
# Keys should contain the _complete_ list of _all_ columns
merged_df = pd.concat(
dfs, axis=1,
keys=pd.MultiIndex.from_arrays([
pd.Series(['L', 'H'], # Top Level Keys
dtype=importance_type),
pd.Series(['Cat1', 'Cat2'], # Second Level Keys
dtype=category_type)
], names=['Importance', 'Category'])
)
Then sort_index
can be used and it will work as expected.然后可以使用
sort_index
并且它会按预期工作。 H
before L
, etc. H
在L
之前,等等。
# Sorting Now Works As Expected
merged_df = merged_df.sort_index(level=[0, 1], axis=1)
merged_df
: merged_df
:
Importance H L
Category Cat2 Cat1
AUMS Total Assets
Firm 1 300 100.0
Firm 2 3400 200.0
Firm 3 800 300.0
Firm 4 800 NaN
DataFrames:数据帧:
import pandas as pd
df1 = pd.DataFrame({
'Total Assets': {'Firm 1': 100, 'Firm 2': 200, 'Firm 3': 300}
})
df2 = pd.DataFrame({
'AUMS': {'Firm 1': 300, 'Firm 2': 3400, 'Firm 3': 800, 'Firm 4': 800}
})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.