Python pandas sort_values() with nested list

Question

I want to sort a nested dict in pyhon via pandas.

import pandas as pd 

# Data structure (nested list):
# {
#   category_name: [[rank, id], ...],
#   ...
# }

all_categories = {
    "category_name1": [[2, 12345], [1, 32512], [3, 32382]],
    "category_name2": [[3, 12345], [9, 25318], [1, 24623]]
}

df = pd.DataFrame(all_categories.items(), columns=['Category', 'Rank'])
df.sort_values(['Rank'], ascending=True, inplace=True) # this only sorts the list of lists

Can anyone tell me how I can get to my goal? I can't figure it out. Via panda it's possible to sort_values() by the second column, but I can't figure out how to sort the nested dict/list.

I want to sort ascending by the rank, not the id.

Answer 1

The fastest option is to apply sort() (note that the sorting occurs in place, so don't assign back to df.Rank in this case):

df.Rank.apply(list.sort)

Or apply sorted() with a custom key and assign back to df.Rank :

df.Rank = df.Rank.apply(lambda row: sorted(row, key=lambda x: x[0]))

Output in either case:

>>> df
         Category                                  Rank
0  category_name1  [[1, 32512], [2, 12345], [3, 32382]]
1  category_name2  [[1, 24623], [3, 12345], [9, 25318]]

This is the perfplot of sort() vs sorted() vs explode() :

import perfplot

def explode(df):
    df = df.explode('Rank')
    df['rank_num'] = df.Rank.str[0]
    df = df.sort_values(['Category', 'rank_num']).groupby('Category', as_index=False).agg(list)
    return df

def apply_sort(df):
    df.Rank.apply(list.sort)
    return df

def apply_sorted(df):
    df.Rank = df.Rank.apply(lambda row: sorted(row, key=lambda x: x[0]))
    return df

perfplot.show(
    setup=lambda n: pd.concat([df] * n),
    n_range=[2 ** k for k in range(25)],
    kernels=[explode, apply_sort, apply_sorted],
    equality_check=None,
)

To filter rows by list length, mask the rows with str.len() and loc[] :

mask = df.Rank.str.len().ge(10)
df.loc[mask, 'Rank'].apply(list.sort)

Answer 2

Try

df = pd.DataFrame(all_categories.items(), columns=['Category', 'Rank']).explode('Rank')
df['Rank'] = df['Rank'].apply(lambda x: sorted(x))

df = df.groupby('Category').agg(list).reset_index()

to dict

dict(df.agg(list, axis=1).values)

Answer 3

Try:

df = pd.DataFrame(all_categories.items(), columns=['Category', 'Rank'])
df.set_index('Rank', inplace=True)
df.sort_index(inplace=True)
df.reset_index(inplace=True)

Or:

df = pd.DataFrame(all_categories.items(), columns=['Category', 'Rank'])
df = df.set_index('Rank').sort_index().reset_index()

Answer 4

It is much more efficient to use df.explode and then sort the values. It will be vectorized.

df = df.explode('Rank')
df['rank_num'] = df.Rank.str[0]

df.sort_values(['Category', 'rank_num'])
  .groupby('Category', as_index=False)
  .agg(list)

Output

         Category                                  Rank   rank_num
0  category_name1  [[1, 32512], [2, 12345], [3, 32382]]  [1, 2, 3]
1  category_name2  [[1, 24623], [3, 12345], [9, 25318]]  [1, 3, 9]

Python pandas sort_values() with nested list

Question

4 answers

solution1
3 ACCPTED 2021-06-14 03:57:39

solution2
1 2021-06-14 03:31:04

solution3
0 2021-06-13 15:16:09

solution4
0 2021-06-14 04:36:35

Python pandas sort_values() with nested list

Question

4 answers

solution1 3 ACCPTED 2021-06-14 03:57:39

solution2 1 2021-06-14 03:31:04

solution3 0 2021-06-13 15:16:09

solution4 0 2021-06-14 04:36:35

solution1
3 ACCPTED 2021-06-14 03:57:39

solution2
1 2021-06-14 03:31:04

solution3
0 2021-06-13 15:16:09

solution4
0 2021-06-14 04:36:35