简体   繁体   中英

Pandas: adding new column to existing Data Frame for grouping purposes

I have a pandas Data Frame consisting of 2000 rows x 8 columns. I want to be able to group the first 4 columns together, as well as the other 4, but I can't figure out how. The purpose is to create a categorical bar plot, with colors assigned according to C1=C5, C2=C6, and so forth.

My Data Frame:

In[1]: df.head(5)
Out[1]: 

    C1  C2  C3  C4  C5  C6  C7  C8
0   15  37  17  10  8   11  19  86
1   39  84  11  5   5   13  9   11
2   10  20  30  51  74  62  56  58
3   88  2   1   3   9   6   0   17
4   17  17  32  24  91  45  63  48

Do you suggest adding another column such as df['Gr'] or what else?

You can use MultiIndex.from_arrays :

df.columns = pd.MultiIndex.from_arrays([['a'] * 4 + ['b'] * 4 , df.columns])
print (df)
    a               b            
   C1  C2  C3  C4  C5  C6  C7  C8
0  15  37  17  10   8  11  19  86
1  39  84  11   5   5  13   9  11
2  10  20  30  51  74  62  56  58
3  88   2   1   3   9   6   0  17
4  17  17  32  24  91  45  63  48

Then you can use xs and DataFrame.plot.bar :

import matplotlib.pyplot as plt

f, a = plt.subplots(2,1)
df.xs('a', axis=1).plot.bar(ax=a[0])
df.xs('b', axis=1).plot.bar(ax=a[1])
plt.show()

图形


import matplotlib.pyplot as plt

df.columns = pd.MultiIndex.from_arrays([['a'] * 4 + ['b'] * 4 , df.columns])
df.stack(0).T.plot.bar(rot='0', legend=False)

df.columns = ['a'] * 4 + ['b'] * 4
df = df.T.plot.bar(rot='0')

plt.show()

use pd.concat

pd.concat([df.iloc[:, :4], df.iloc[:, 4:]], axis=1, keys=['first4', 'second4'])

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM