简体   繁体   中英

Pandas: change order of crosstab result

How to change order in the result of pd.crosstab :

pd.crosstab(df['col1'], df['col2'])

I would like to be able to sort by:

  • unique values of either df['col1'] or df['col2'] (cols/rows of the crosstab result)
  • by marginal values (eg showing higher-count values of df['col1'] closer to the top)

Well, it would be easier to give you a solution if you provided an example of your data, since it can vary a lot accordingly. I will try to build a case scenario and possible solution below.

If we take the example data and crosstab:

a = np.array(['foo', 'foo', 'foo', 'foo', 'bar', 'bar',
       'bar', 'bar', 'foo', 'foo', 'foo'], dtype=object)

c = np.array(['dull', 'dull', 'shiny', 'dull', 'dull', 'weird',
       'shiny', 'dull', 'shiny', 'shiny', 'shiny'], dtype=object)

CT = pd.crosstab(a, c, rownames=['a'], colnames=['c'])

CT

We have the following output:

在此输入图像描述

Thats a regular dataframe object, its just "crosstabed" or better yet "pivottabled" accordingly.

You would like to show:

  1. unique values of either df['col1'] or df['col2'] (cols/rows of the crosstab result)
  2. by marginal values (eg showing higher-count values of df['col1'] closer to the top)

So lets start with "1":

There are different ways you can do that, a simple solution would be to show the same dataframe object with boolean values for singular cases;

[CT == 1]

在此输入图像描述

However, that format might not be what you desire in case of large dataframes.

You could just print the positive cases, or list/append 'em, a simple example would be:

for col in CT.columns:

    for index in CT.index:

        if CT.loc[index,col] == 1:

            print (index,col,'singular')

Output:

('bar', 'shiny', 'singular')
('bar', 'weird', 'singular')

The second item/desire is more complicated. You want to order by higher value. But there might be divergences. A higher value in one column, associated to one set of indexes, will most likely diverge in order from the second column (also associated in the same indexes).

Hence, you can choose to order by one specific column:

CT.sort_values('column_name', ascending=False)

Or, you can define a metric by which you want to order (row mean value) and sort accordingly.

Hope that helps!

df_stack = pd.DataFrame({'Country':['USA','USA','MEX','IND','UK','UK','UK'],
               'Region':['Americas',np.nan,np.nan,'Asia','Europe',np.nan,np.nan],
               'Flower':['Rose','Rose','Lily','Orchid','Dandelion','Dandelion','Dandelion'],
               'Animal':['Bison',np.nan,'Golden Eagle','Tiger','Lion','Lion',np.nan],
               'Game':['Baseball','Baseball','soccer','hockey','cricket','cricket','cricket']})
print("-------Normal Dataframe------")
print(df_stack)
#created cross tab for getting animal regionwise
crosstab = pd.crosstab(df_stack.Region,df_stack.Animal)
print("-------Before Sorting Crosstab------")
print(crosstab)
#Apply sorting to specific column in this case 'Lion'
crosstab = crosstab.sort_values(['Lion'], ascending=False)
print("-------After Sorting Crosstab by Lion Column------")
print(crosstab)

-------Normal Dataframe------
  Country    Region     Flower        Animal      Game
0     USA  Americas       Rose         Bison  Baseball
1     USA  Americas       Rose         Bison  Baseball
2     MEX  Americas       Lily  Golden Eagle    soccer
3     IND      Asia     Orchid         Tiger    hockey
4      UK    Europe  Dandelion          Lion   cricket
5      UK    Europe  Dandelion          Lion   cricket
6      UK    Europe  Dandelion          Lion   cricket
-------Before Sorting Crosstab------
Animal    Bison  Golden Eagle  Lion  Tiger
Region                                    
Americas      2             1     0      0
Asia          0             0     0      1
Europe        0             0     3      0
-------After Sorting Crosstab by Lion Column------
Animal    Bison  Golden Eagle  Lion  Tiger
Region                                    
Europe        0             0     3      0
Americas      2             1     0      0
Asia          0             0     0      1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM