How to change order in the result of pd.crosstab :
pd.crosstab(df['col1'], df['col2'])
I would like to be able to sort by:
Well, it would be easier to give you a solution if you provided an example of your data, since it can vary a lot accordingly. I will try to build a case scenario and possible solution below.
If we take the example data and crosstab:
a = np.array(['foo', 'foo', 'foo', 'foo', 'bar', 'bar',
'bar', 'bar', 'foo', 'foo', 'foo'], dtype=object)
c = np.array(['dull', 'dull', 'shiny', 'dull', 'dull', 'weird',
'shiny', 'dull', 'shiny', 'shiny', 'shiny'], dtype=object)
CT = pd.crosstab(a, c, rownames=['a'], colnames=['c'])
CT
We have the following output:
Thats a regular dataframe object, its just "crosstabed" or better yet "pivottabled" accordingly.
You would like to show:
So lets start with "1":
There are different ways you can do that, a simple solution would be to show the same dataframe object with boolean values for singular cases;
[CT == 1]
However, that format might not be what you desire in case of large dataframes.
You could just print the positive cases, or list/append 'em, a simple example would be:
for col in CT.columns:
for index in CT.index:
if CT.loc[index,col] == 1:
print (index,col,'singular')
Output:
('bar', 'shiny', 'singular')
('bar', 'weird', 'singular')
The second item/desire is more complicated. You want to order by higher value. But there might be divergences. A higher value in one column, associated to one set of indexes, will most likely diverge in order from the second column (also associated in the same indexes).
Hence, you can choose to order by one specific column:
CT.sort_values('column_name', ascending=False)
Or, you can define a metric by which you want to order (row mean value) and sort accordingly.
Hope that helps!
df_stack = pd.DataFrame({'Country':['USA','USA','MEX','IND','UK','UK','UK'],
'Region':['Americas',np.nan,np.nan,'Asia','Europe',np.nan,np.nan],
'Flower':['Rose','Rose','Lily','Orchid','Dandelion','Dandelion','Dandelion'],
'Animal':['Bison',np.nan,'Golden Eagle','Tiger','Lion','Lion',np.nan],
'Game':['Baseball','Baseball','soccer','hockey','cricket','cricket','cricket']})
print("-------Normal Dataframe------")
print(df_stack)
#created cross tab for getting animal regionwise
crosstab = pd.crosstab(df_stack.Region,df_stack.Animal)
print("-------Before Sorting Crosstab------")
print(crosstab)
#Apply sorting to specific column in this case 'Lion'
crosstab = crosstab.sort_values(['Lion'], ascending=False)
print("-------After Sorting Crosstab by Lion Column------")
print(crosstab)
-------Normal Dataframe------
Country Region Flower Animal Game
0 USA Americas Rose Bison Baseball
1 USA Americas Rose Bison Baseball
2 MEX Americas Lily Golden Eagle soccer
3 IND Asia Orchid Tiger hockey
4 UK Europe Dandelion Lion cricket
5 UK Europe Dandelion Lion cricket
6 UK Europe Dandelion Lion cricket
-------Before Sorting Crosstab------
Animal Bison Golden Eagle Lion Tiger
Region
Americas 2 1 0 0
Asia 0 0 0 1
Europe 0 0 3 0
-------After Sorting Crosstab by Lion Column------
Animal Bison Golden Eagle Lion Tiger
Region
Europe 0 0 3 0
Americas 2 1 0 0
Asia 0 0 0 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.