简体   繁体   中英

Efficient and fastest way in Pandas to create sorted list from column values

Given a dataframe

A B C
3 1 2
2 1 3
3 2 1

I would like to get a new column with column names in sorted order

A B C new_col
3 1 2 [B,C,A]
2 1 3 [B,A,C]
3 2 1 [C,B,A]

This is my code. It works but is quite slow.

def blist(x):
    col_dict = {}
    for col in col_list:
        col_dict[col] = x[col]
    sorted_tuple =  sorted(col_dict.items(), key=operator.itemgetter(1))
    return [i[0] for i in sorted_tuple]

df['new_col'] = df.apply(blist,axis=1)

I will appreciate a better approach to solve this problem.

Try to use np.argsort() in conjunction with np.take() :

In [132]: df['new_col'] = np.take(df.columns, np.argsort(df)).tolist()

In [133]: df
Out[133]:
   A  B  C    new_col
0  3  1  2  [B, C, A]
1  2  1  3  [B, A, C]
2  3  2  1  [C, B, A]

Timing for 30.000 rows DF:

In [182]: df = pd.concat([df] * 10**4, ignore_index=True)

In [183]: df.shape
Out[183]: (30000, 3)

In [184]: %timeit df.apply(blist,axis=1)
4.84 s ± 31 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [185]: %timeit np.take(df.columns, np.argsort(df)).tolist()
5.45 ms ± 26.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Ratio:

In [187]: (4.84*1000)/5.45
Out[187]: 888.0733944954128

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM