I'm attempting to get the top x largest values from each column in a pandas dataframe. Each column is one date while each row is a different stock ticker(see photo)
ideally i'd like to see the ticker and number of the top 5 for each date(column)
I have tried a few different iterators but none have worked and kept the index.
The output I want is into a new csv with the date and top 5 stock tickers (index) based on their value in the column that day.
import pandas as pd
df = pd.read_csv (see photo)
Haven't been able to get it to turn out right. enter image description here
Apply pd.Series.nlargest
to each column to mask everything but the top N values. Then unstack
and remove NaN
. I'll use the top 2 values here for illustration.
import pandas as pd
import numpy as np
np.random.seed(42)
df = pd.DataFrame(np.random.normal(0, 10, (4, 3)),
columns=['Date1', 'Date2', 'Date3'],
index=['Stock1', 'Stock2', 'Stock3', 'Stock4'])
# Date1 Date2 Date3
#Stock1 4.967142 -1.382643 6.476885
#Stock2 15.230299 -2.341534 -2.341370
#Stock3 15.792128 7.674347 -4.694744
#Stock4 5.425600 -4.634177 -4.657298
df.apply(pd.Series.nlargest, n=2).unstack().dropna()
#Date1 Stock2 15.230299
# Stock3 15.792128
#Date2 Stock1 -1.382643
# Stock3 7.674347
#Date3 Stock1 6.476885
# Stock2 -2.341370
#dtype: float64
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.