简体   繁体   中英

Keep column index after pandas.melt?

I have a data frame of values varying over time. For example, the number of cars I observe on a street:

df = pd.DataFrame(
    [{'Orange': 0, 'Green': 2, 'Blue': 1},
     {'Orange': 2, 'Green': 4, 'Blue': 4},
     {'Orange': 1, 'Green': 3, 'Blue': 10}
    ])

I want to create graphs that highlight the cars with the highest values. So I sort by maximum value.

df.loc[:, df.max().sort_values(ascending=False).index]
   Blue  Green  Orange
0     1      2       0
1     4      4       2
2    10      3       1

I'm using seaborn to create these graphs. From what I understand I need to melt this representation to a tidy format.

tidy = pd.melt(df.reset_index(), id_vars=['index'], var_name='color', value_name='number')
   index   color  number
0      0    Blue      1
1      1    Blue      4
2      2    Blue     10
3      0   Green      2
4      1   Green      4
5      2   Green      3
6      0  Orange      0
7      1  Orange      2
8      2  Orange      1

How can I add a column that represents the column order before the data frame was melted?

   index   color  number   importance
0      0    Blue      1            0
1      1    Blue      4            0
2      2    Blue     10            0
3      0   Green      2            1
4      1   Green      4            1
5      2   Green      3            1
6      0  Orange      0            2 
7      1  Orange      2            2
8      2  Orange      1            2

I see that I can still find the maximum columns after melting, but I'm not sure how to add that as a new column to the data frame:

tidy.groupby('color').number.max().sort_values(ascending=False).index
Index(['Blue', 'Green', 'Orange'], dtype='object', name='color')

EDIT To clarify, I'm plotting this on a line graph.

axes = sns.relplot(data=tidy, x='index', y='number', hue='color', kind="line")

This is what the graph currently looks like: 汽车样本图

I want to use the importance data to either: color / bold the lines, or split the graph into multiple graphs, so it looks something like these

汽车样本图汽车样本图

You can make a MultiIndex on the columns, then stack both levels.

# Map color to importance
d = (df.max().rank(method='dense', ascending=False)-1).astype(int)

df.columns = pd.MultiIndex.from_arrays([df.columns, df.columns.map(d)],
                                       names=['color', 'importance'])
#color      Orange Green Blue
#importance      2     1    0
#0               0     2    1
#1               2     4    4
#2               1     3   10

df = df.rename_axis(index='index').stack([0,1]).to_frame('value').reset_index()

   index   color  importance  value
0      0    Blue           0    1.0
1      0   Green           1    2.0
2      0  Orange           2    0.0
3      1    Blue           0    4.0
4      1   Green           1    4.0
5      1  Orange           2    2.0
6      2    Blue           0   10.0
7      2   Green           1    3.0
8      2  Orange           2    1.0

Another option builds on the melt that you have and derives the importance column later:

tidy["importance"] = tidy["color"].map(df.columns.to_list().index)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM