let's say i have the following code:
df1 = pd.DataFrame(np.random.randn(10,4), columns=['A','B','C','D'])
df1['dataframe'] = 'df1'
df2 = pd.DataFrame(np.random.randn(10,4), columns=['A','B','C','D'])
df2['dataframe'] = 'df2'
df = pd.concat([df1, df2])
df.reset_index().set_index(['dataframe','index'])
this will return me a dataframe with 2 levels of indices, 'dataframe' and 'index'. i'm not sure what the correct term is but visually, the first index spans across rows as opposed to columns.
there are 2 operations i would like to perform on this dataframe that i am struggling with.
1) i would like to rename the columns in each "sub-dataframe" to something different, taken from a different list and apply them accordingly based on the first index previously assigned. i have tried the following but it does not work if i display "df" again:
new_cols = ['df1', 'df2']
for i,x in enumerate(new_cols):
old_cols = df.loc[x].columns.tolist()
df.loc[x].rename(columns={col_label: '{}_{}'.format(x,col_label) for col_label in old_cols}, inplace=True)
so, to be clear, instead of A,B,C,D i'd like df1_A...df1_D and df2_A...df2_D
2) i would like to re-orient this dataframe such that they span across the columns and so i would be scrolling across in order to view each "sub-dataframe" rather than up and down.
i've consulted the pandas API but still not able to get this right.
OK, from that starting point you firstly want to call reset_index
again like so:
df.reset_index(drop=True)
Now you should have only one level in the index, and the columns A
, B
, C
, D
, and dataframe
.
Now pivot by the dataframe
column:
df_pivot = df.pivot(columns='dataframe')
You now have a dataframe with hierarchically-indexed columns, which will allow you to scroll across and see A
, B
, C
, and D
at the top level, with the df1
and df2
just underneath.
If you're just wanting to explore the data it's actually best to stop there. You'll be able to index in a natural way without renaming columns, and it'll be easy to explore the data by scrolling horizontally. To index into A
and df1
values, you'd write:
df_pivot['A']['df1']
That's a natural syntax. But if you really did want to add in the underscores, you could add them like so:
df_pivot.columns = ['_'.join(col[::-1]).strip() for col in df_pivot.columns.values]
Because the pivot table's columns are represented by a MultiIndex, df_pivot.columns.values
returns an array of tuples. Each tuple is something like ('A','df1')
so if you want column names of the form df1_A
you do need the [::1]
I've added there, so that you join the tuple members in reverse order. If you're happy with the other order for column names ( A_df1
), then you can remove the reverse step:
df_pivot.columns = ['_'.join(col).strip() for col in df_pivot.columns.values]
Using the dataframe
generated by the original code snippet, we can create two separate dataframes
based on which dataframe
they belong to, then drop the "dataframe"
column from both. We then assign the new column names to each of the columns
properties of the two dataframes. At last, we pass in the list of df2
column names into df1
and since it doesn't exist, pandas creates new columns and we assign df2
's values to these new columns.
Edit: Got a line which makes a column multiindex from pandas cookbook
df1 = df[df["dataframe"] == "df1"].copy().drop("dataframe",axis = 1)
df2 = df[df["dataframe"] == "df2"].copy().drop("dataframe", axis = 1)
df1.columns = ["df1_" + df1.columns]
df2.columns = ["df2_" + df2.columns]
df1[df2.columns] = df2
df1.columns = pd.MultiIndex.from_tuples([tuple(c.split('_')) for c in df1.columns]);
print df1
Output (New):
df1 df2 \
A B C D A B C
0 -0.228363 0.675313 -0.076193 -0.805547 0.920632 0.789152 0.275401
1 0.145603 0.422236 0.623796 0.233534 2.338283 -1.033269 -0.334333
2 -0.526696 0.307727 0.478437 -0.068488 -0.475583 -0.802997 -0.059091
3 -1.676880 -0.272451 -0.777794 0.490290 1.456024 0.340962 -0.436860
4 1.203065 -0.198686 -1.065447 1.188931 -1.140757 0.046975 -2.596953
5 -0.603939 0.734130 -0.321634 0.150161 2.228873 0.748693 -0.300975
6 1.028938 0.114437 0.268499 0.260428 -1.896507 0.136147 0.004577
7 -1.329070 -0.901562 -1.401573 0.715426 -1.711233 0.420301 0.643113
8 2.033646 -0.550192 1.532104 -1.196995 -0.004135 -1.334320 0.110115
9 -0.818145 -1.240037 0.880706 -0.625155 -0.672653 0.365357 -0.864840
D
0 -0.888626
1 -0.952065
2 0.249387
3 0.952315
4 -1.804463
5 -0.428231
6 -0.257906
7 1.551899
8 0.054855
9 0.679394
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.