I want to sort a nested dict in pyhon via pandas.
import pandas as pd
# Data structure (nested list):
# {
# category_name: [[rank, id], ...],
# ...
# }
all_categories = {
"category_name1": [[2, 12345], [1, 32512], [3, 32382]],
"category_name2": [[3, 12345], [9, 25318], [1, 24623]]
}
df = pd.DataFrame(all_categories.items(), columns=['Category', 'Rank'])
df.sort_values(['Rank'], ascending=True, inplace=True) # this only sorts the list of lists
Can anyone tell me how I can get to my goal? I can't figure it out. Via panda it's possible to sort_values()
by the second column, but I can't figure out how to sort the nested dict/list.
I want to sort ascending by the rank, not the id.
The fastest option is to apply sort()
(note that the sorting occurs in place, so don't assign back to df.Rank
in this case):
df.Rank.apply(list.sort)
Or apply sorted()
with a custom key and assign back to df.Rank
:
df.Rank = df.Rank.apply(lambda row: sorted(row, key=lambda x: x[0]))
Output in either case:
>>> df
Category Rank
0 category_name1 [[1, 32512], [2, 12345], [3, 32382]]
1 category_name2 [[1, 24623], [3, 12345], [9, 25318]]
This is the perfplot of sort()
vs sorted()
vs explode()
:
import perfplot
def explode(df):
df = df.explode('Rank')
df['rank_num'] = df.Rank.str[0]
df = df.sort_values(['Category', 'rank_num']).groupby('Category', as_index=False).agg(list)
return df
def apply_sort(df):
df.Rank.apply(list.sort)
return df
def apply_sorted(df):
df.Rank = df.Rank.apply(lambda row: sorted(row, key=lambda x: x[0]))
return df
perfplot.show(
setup=lambda n: pd.concat([df] * n),
n_range=[2 ** k for k in range(25)],
kernels=[explode, apply_sort, apply_sorted],
equality_check=None,
)
To filter rows by list length, mask the rows with str.len()
and loc[]
:
mask = df.Rank.str.len().ge(10)
df.loc[mask, 'Rank'].apply(list.sort)
Try
df = pd.DataFrame(all_categories.items(), columns=['Category', 'Rank']).explode('Rank')
df['Rank'] = df['Rank'].apply(lambda x: sorted(x))
df = df.groupby('Category').agg(list).reset_index()
to dict
dict(df.agg(list, axis=1).values)
Try:
df = pd.DataFrame(all_categories.items(), columns=['Category', 'Rank'])
df.set_index('Rank', inplace=True)
df.sort_index(inplace=True)
df.reset_index(inplace=True)
Or:
df = pd.DataFrame(all_categories.items(), columns=['Category', 'Rank'])
df = df.set_index('Rank').sort_index().reset_index()
It is much more efficient to use df.explode
and then sort the values. It will be vectorized.
df = df.explode('Rank')
df['rank_num'] = df.Rank.str[0]
df.sort_values(['Category', 'rank_num'])
.groupby('Category', as_index=False)
.agg(list)
Output
Category Rank rank_num
0 category_name1 [[1, 32512], [2, 12345], [3, 32382]] [1, 2, 3]
1 category_name2 [[1, 24623], [3, 12345], [9, 25318]] [1, 3, 9]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.