[英]Pandas - concatenating two multi-index dataframes
I am having a dataframe as follows: 我有一个数据框,如下所示:
df.head()
Student Name Q1 Q2 Q3
Month Roll No
2016-08-01 0 Save Mithil Vinay 0.0 0.0 0.0
1 Abraham Ancy Chandy 6.0 5.0 5.0
2 Barabde Pranjal Sanjiv 7.0 5.0 5.0
3 Bari Siddhesh Kishor 8.0 5.0 3.0
4 Barretto Cleon Domnic 1.0 5.0 4.0
Now I wanted to make a hierarchical column index, so I did it the following way: 现在,我想创建一个分层的列索引,因此我通过以下方式做到了:
big_df = pd.concat([df['Student Name'], df[['Q1', 'Q2', 'Q3']]], axis=1, keys=['Name', 'IS'])
and was able to get the following: 并获得以下信息:
>>> big_df
Name IS
Student Name Q1 Q2 Q3
Month Roll No
2016-08-01 0 Save Mithil Vinay 0.0 0.0 0.0
1 Abraham Ancy Chandy 6.0 5.0 5.0
2 Barabde Pranjal Sanjiv 7.0 5.0 5.0
3 Bari Siddhesh Kishor 8.0 5.0 3.0
4 Barretto Cleon Domnic 1.0 5.0 4.0
Now for the second iteration, I want to concatenate only the Q1, Q2, Q3
values from the new dataframe to the big_df
dataframe (the previously concatenated dataframe). 现在,对于第二个迭代,我只想将新数据帧中的
Q1, Q2, Q3
值连接到big_df
数据帧(先前连接的数据帧)中。 Now the dataframe for the second iteration is as follows: 现在第二次迭代的数据帧如下:
Student Name Q1 Q2 Q3
Month Roll No
2016-08-01 0 Save Mithil Vinay 0.0 0.0 0.0
1 Abraham Ancy Chandy 8.0 5.0 5.0
2 Barabde Pranjal Sanjiv 7.0 5.0 4.0
3 Bari Siddhesh Kishor 8.0 4.0 3.0
4 Barretto Cleon Domnic 2.0 3.0 4.0
I wanted the big_df
like the following: 我想要
big_df
如下所示:
Name IS CC
Student Name Q1 Q2 Q3 Q1 Q2 Q3
Month Roll No
2016-08-01 0 Save Mithil Vinay 0.0 0.0 0.0 0.0 0.0 0.0
1 Abraham Ancy Chandy 6.0 5.0 5.0 8.0 5.0 5.0
2 Barabde Pranjal Sanjiv 7.0 5.0 5.0 7.0 5.0 4.0
3 Bari Siddhesh Kishor 8.0 5.0 3.0 8.0 4.0 3.0
4 Barretto Cleon Domnic 1.0 5.0 4.0 2.0 3.0 4.0
I tried the following codes, but all are giving error: 我尝试了以下代码,但都给出了错误:
big_df.concat([df[['Q1', 'Q2', 'Q3']]], axis=1, keys=['CC'])
pd.concat([big_df, df[['Q1', 'Q2', 'Q3']]], axis=1, keys=['Name', 'CC'])
Where am I doing the error? 我在哪里出错? Kindly help.
请帮助。 I am new to Pandas
我是熊猫新手
Drop the topmost level of big_df
: 删除最高层的
big_df
:
big_df.columns = big_df.columns.droplevel(level=0)
Concatenate them providing three different frames as input matching the number of keys to be used: 将它们连接起来,提供三个不同的框架作为输入,以匹配要使用的键的数量:
Q_cols = ['Q1', 'Q2', 'Q3']
key_names = ['Name', 'IS', 'CC']
pd.concat([big_df[['Student Name']], big_df[Q_cols], df[Q_cols]], axis=1, keys=key_names)
First, you're way better off setting your index to be ['Month', 'Roll no.', 'Student Name']
. 首先,最好将索引设置为
['Month', 'Roll no.', 'Student Name']
。 That will simplify your concat syntaxes a lot and ensure you match on the name of the students too. 这将大大简化您的concat语法,并确保您也匹配学生的姓名。
df.set_index('Student Name', append=True, inplace=True)
Second, I suggest you do it differently and store your df
dataframes (with the Q1/Q2/Q3 values) during your iteration with a reference to the name for the highest column level (eg: 'IS', 'CC'). 其次,我建议您以不同的方式进行操作,并在迭代过程中参考最高列级别的名称(例如:“ IS”,“ CC”)存储
df
数据帧(具有Q1 / Q2 / Q3值)。 A dict would be perfect for this, and pandas does accept a dict as an argument to pd.concat
dict对此是完美的,pandas确实接受dict作为
pd.concat
的参数。
# Creating a dictionnary with the first df from your question
df_dict = {'IS': df}
# Iterate....
# Append the new df to the df_dict
df_dict['CC'] = df
Now, after looping through, here's your dict: 现在,遍历完之后,这是您的字典:
df_dict
In [10]: df_dict
Out[10]:
{'CC': Q1 Q2 Q3
Month Roll No Student Name
2016-08-01 0 Save Mithil Vinay 0.0 0.0 0.0
1 Abraham Ancy Chandy 6.0 5.0 5.0
2 Barabde Pranjal Sanjiv 7.0 5.0 5.0
3 Bari Siddhesh Kisho 8.0 5.0 3.0
4 Barretto Cleon Domnic 1.0 5.0 4.0,
'IS': Q1 Q2 Q3
Month Roll No Student Name
2016-08-01 0 Save Mithil Vinay 0.0 0.0 0.0
1 Abraham Ancy Chandy 8.0 5.0 5.0
2 Barabde Pranjal Sanjiv 7.0 5.0 4.0
3 Bari Siddhesh Kisho 8.0 4.0 3.0
4 Barretto Cleon Domnic 2.0 3.0 4.0}
So now if you concat, pandas does it nicely, and automatically for you: 因此,现在,如果您进行连接,pandas会很好地为您自动完成:
In [11]: big_df = pd.concat(df_dict, axis=1)
big_df
Out[11]:
If you really wanted to do it iteratively, you should prepend your new multilevel ('CC') before concat with big_df 如果您确实想迭代进行,则应在与big_df连接之前添加新的多级('CC')
df.columns = pd.MultiIndex.from_tuples([('IS', x) for x in df.columns])
# Then you can concat, give the same result as the picture above.
pd.concat([big_df, df], axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.