[英]How to split a dataframe to multiple dataframes bases on column names
My dataframe is as below, 我的数据框如下,
_dict = {'t_head': ['H1', 'H2', 'H3', 'H4', 'H5','H6'],
'r_head': ['Revenue', 'Revenue', 'Income', 'Income', 'Cash', 'Expenses'],
'3ME__ Q219': [159.9, '', 45.6, '', '', ''],
'3ME__ Q218': [112.3, '', 27.2, '', '', ''],
'3ME__ Q119': [121.0, '', 23.1, '', '', ''],
'3ME__ Q18': [85.7, '', 15.3, '', '', ''],
'3ME__ Q418': [160.5, '', 51.1, '', '', ''],
'9ME__ Q417': [102.6, '', 24.2, '', '', ''],
'9ME__ Q318': [118.8, '', 30.2, '', '', ''],
'9ME__ Q317': [79.4, '', 15.3, '', '', ''],
'6ME__ Q219': ['', 280.9, '', 68.7, '', ''],
'6ME__ Q218': ['', 198.0, '', 42.6, '', ''],
'Q219': ['', '', '', '', 1305, 1239],
'Q418': ['', '', '', '', 2072, 1117]
}
df = pd.DataFrame.from_dict(_dict)
print(df)
t_head r_head 3ME__ Q219 3ME__ Q218 3ME__ Q119 3ME__ Q18 3ME__ Q418 9ME__ Q417 9ME__ Q318 9ME__ Q317 6ME__ Q219 6ME__ Q218 Q219 Q418
0 H1 Revenue 159.9 112.3 121 85.7 160.5 102.6 118.8 79.4
1 H2 Revenue 280.9 198
2 H3 Income 45.6 27.2 23.1 15.3 51.1 24.2 30.2 15.3
3 H4 Income 68.7 42.6
4 H5 Cash 1305 2072
5 H6 Expenses 1239 1117
I want to split this dataframe into multiple dtaframes base on column heading. 我想基于列标题将此数据帧拆分为多个dtaframe。 Here column headings can start with 3ME__
, 6ME__
, 9ME__
( all/any/none can be present ) or other values. 这里的列标题可以从3ME__
, 6ME__
, 9ME__
( 所有/任何/ 9ME__
存在 )或其他值开始。 i want to all columns starting with 3ME__
to be in one dataframe, 6ME__
to another...etc. 我希望所有以3ME__
开头的列都在一个数据帧中, 6ME__
到另一个......等等。 and the all of the rest to be in a fourth dataframe. 所有其余的都在第四个数据帧中。
What i had tried is as below, 我试过的如下,
df1 = df.filter(regex='3ME__')
if not df1.empty:
df1 = df1[df1.iloc[:,0].astype(bool)]
df2 = df.filter(regex='6ME__')
if not df2.empty:
df2 = df2[df2.iloc[:,0].astype(bool)]
df3 = df.filter(regex='9ME__')
if not df3.empty:
df3 = df3[df3.iloc[:,0].astype(bool)]
Here i am able to filter out column names starting with 3ME__
, 6ME__
& 9ME__
to different dataframes, but not able to get the rest of column headings to one dataframe . 在这里,我能够将以3ME__
, 6ME__
和9ME__
开头的列名称过滤到不同的数据帧,但无法将其余的列标题过滤到一个数据帧 。
1.) How to get the rest of column headings to one dataframe? 1.) 如何将其余列标题添加到一个数据帧?
2.) Is there any simpler method to split into dictionary with a key and dataframes as values? 2.) 是否有任何更简单的方法可以将键和数据帧作为值拆分为字典?
Please Help. 请帮忙。
I would do the below: 我会做以下事情:
m=df.set_index(['t_head','r_head']) #set the 2 columns as index
Then split columns and group by on axis 1 and make a dict with each group 然后在第1轴上拆分列并分组,并为每个组创建一个字典
d={f'df_{i}': g for i, g in m.groupby(m.columns.str.split('_').str[0],axis=1)}
Then call each key to access this dictionary: 然后调用每个键来访问这个字典:
print(d['df_3ME'])
Based on further discussion we do the same operation but with a condition: 根据进一步的讨论,我们做同样的操作,但条件是:
cond=df.columns.str.contains('__') #check if cols have double _
d={f'df_{i}':g for i, g in
df.loc[:,cond].groupby(df.loc[:,cond].columns.str.split('__').str[0],axis=1)}
d.update({'Misc':df.loc[:,~cond]}) #update the dict with all that doesnt meet condition
print(d['df_3ME'])
3ME__ Q219 3ME__ Q218 3ME__ Q119 3ME__ Q18 3ME__ Q418
0 159.9 112.3 121 85.7 160.5
1
2 45.6 27.2 23.1 15.3 51.1
3
4
5
print(d['Misc'])
t_head r_head Q219 Q418
0 H1 Revenue
1 H2 Revenue
2 H3 Income
3 H4 Income
4 H5 Cash 1305 2072
5 H6 Expenses 1239 1117
You can retreive the column names of your created dataframes and select by the columns that are not in it: 您可以检索创建的数据框的列名称,并按不在其中的列进行选择:
other_columns = [x for x in df.columns if x not in (list(df1.columns) + list(df2.columns) + list(df3.columns))]
other_df = df[other_columns]
You can try like this also: 您也可以这样尝试:
k = list(df1.columns)+ list(df2.columns)+ list(df3.columns)
df = df.drop(k, axis=1)
print(df)
A combination of all the above got me to the what i was looking for, 以上所有的组合让我得到了我想要的东西,
def _split_dataframes(df):
df = df.set_index(['t_head','r_head'])
final_dict_key = 0
final_dict = {}
names_list = []
for elems in ['3ME__','6ME__','9ME__','other']:
if elems != 'other':
temp_df = df.filter(regex=elems)
temp_df = temp_df.loc[(temp_df!='').all(axis=1)]
names_list.extend(list(temp_df.columns))
if not temp_df.empty:
temp_df.reset_index(inplace=True)
final_dict[str(final_dict_key)] = temp_df
final_dict_key+= 1
else:
df.drop(names_list, axis=1,inplace=True)
df = df.loc[(df!='').all(axis=1)]
if not df.empty:
df.reset_index(inplace=True)
final_dict[str(final_dict_key)] = df
this will split the main dataframe and save to a dictionary with an incremental-key as below 这将拆分主数据帧并使用增量键保存到字典,如下所示
{
'0':
t_head r_head 3ME__ Q219 3ME__ Q218 3ME__ Q119 3ME__ Q18 3ME__ Q418
0 H1 Revenue 159.9 112.3 121 85.7 160.5
1 H3 Income 45.6 27.2 23.1 15.3 51.1,
'1':
t_head r_head 6ME__ Q219 6ME__ Q218
0 H2 Revenue 280.9 198
1 H4 Income 68.7 42.6,
'2':
t_head r_head 9ME__ Q417 9ME__ Q318 9ME__ Q317
0 H1 Revenue 102.6 118.8 79.4
1 H3 Income 24.2 30.2 15.3,
'3':
t_head r_head Q219 Q418
0 H5 Cash 1305 2072
1 H6 Expenses 1239 1117
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.