简体   繁体   中英

Rank elements in a data frame while keeping index

I'm using the following formula to gather the top 20 elements for each row in a data frame. It works great but it is dropping the index column from the df_returns but I'd like to keep them. I was using dates as the index in the df_returns data frame and I'd like to have the same dates corresponding to the new data in the df_rank data frame.

df_rank = pd.DataFrame({n: df_returns.T[col].nlargest(21).index.tolist() for n, col in enumerate(df_returns.T)}).T

For example, let's say I was wanting to get the top 3 from the following data frame:

           A   B   C   D   E
1/1/2014   5   4   6   8   1
2/1/2014   2   1   6   3   1
3/1/2014   8   2   3   5   1

The results I'm getting currently are:

0   D   C   A
1   C   D   A
2   A   D   C

The results I'd like to get are:

1/1/2014   D   C   A
2/1/2014   C   D   A
3/1/2014   A   D   C

您可以使用set_index来设置原始数据帧的原始索引:

df_rank.set_index(df_returns.index)

If you want to apply a function to each row of a data frame, apply is often your best bet (I've rewritten your function a bit too):

d.apply(lambda r: r.sort_values(ascending = False)[0:3].index.tolist(), axis=1)

Out[88]:
1/1/2014    [D, C, A]
2/1/2014    [C, D, A]
3/1/2014    [A, D, C]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM