I'm using the following formula to gather the top 20 elements for each row in a data frame. It works great but it is dropping the index column from the df_returns but I'd like to keep them. I was using dates as the index in the df_returns data frame and I'd like to have the same dates corresponding to the new data in the df_rank data frame.
df_rank = pd.DataFrame({n: df_returns.T[col].nlargest(21).index.tolist() for n, col in enumerate(df_returns.T)}).T
For example, let's say I was wanting to get the top 3 from the following data frame:
A B C D E
1/1/2014 5 4 6 8 1
2/1/2014 2 1 6 3 1
3/1/2014 8 2 3 5 1
The results I'm getting currently are:
0 D C A
1 C D A
2 A D C
The results I'd like to get are:
1/1/2014 D C A
2/1/2014 C D A
3/1/2014 A D C
您可以使用set_index
来设置原始数据帧的原始索引:
df_rank.set_index(df_returns.index)
If you want to apply a function to each row of a data frame, apply
is often your best bet (I've rewritten your function a bit too):
d.apply(lambda r: r.sort_values(ascending = False)[0:3].index.tolist(), axis=1)
Out[88]:
1/1/2014 [D, C, A]
2/1/2014 [C, D, A]
3/1/2014 [A, D, C]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.