My goal is to drop the two stocks/tickers (or bottom decile) based on the volume (drop the entire row for 2 stocks/tickers (level 1) with the lowest volumes for each level 0 index Date)
The DataFrame has already been sorted by volume so for each date, its sorted by volume in ascending order. So the DataFrame would might look like ( has been shortened to 5 stocks instead of 20 ): Example of DataFrame: '''
Date Ticker col1 col2 col3 Volume
2020-01-01 stock1 - - - 5
stock2 - - - 10
stock3 - - - 20
stock4 - - - 40
stock5 - - - 43
2020-02-01 stock3 - - - 7
stock5 - - - 14
stock1 - - - 33
stock2 - - - 50
stock4 - - - 52
For level 0 index Date "2020-01-01", I would want to drop stock1 and stock2, but for the next level index Date "2020-02-01" I want to drop the new lowest 2 which are stock3 and stock5.
Note: the real DataFrame will be much bigger with more than just 5 stocks and for many more months
So far I have tried adding a decile column ( since my real goal is to do this for 20 stocks ), using qcut
,which will automatically give me the lowest two values by volume, but I wasn't able to replicated for EACH level 0 date (was only successful for one date and not sure how to do it for each level 0 date).
I also tried nsmallest
and nlargest
but encountered errors due to this being a DataFrame.
Do you have any suggestions as to how I can do this task? I feel as though I'm on the right path but I am missing something basic. Any insight is appreciated!
Since your DataFrame is already sorted by date and volume, you can drop the first 2 rows from each date group by adapting any of the answers to Python: Pandas - Delete the first row by group . For example:
# Create input data based on your example
d = {'Date': 5 * ['2020-01-01'] +
5 * ['2020-02-01'],
'Ticker': ['stock1', 'stock2', 'stock3', 'stock4', 'stock5',
'stock3', 'stock5', 'stock1', 'stock2', 'stock4'],
'col1': 10 * ['-'],
'Volume': [5, 10, 20, 40, 43, 7, 14, 33, 50, 52]}
df = pd.DataFrame(d)
# Get the first and second rows of each date group
to_del = df.groupby('Date', as_index=False).nth([0, 1])
# Intentionally duplicate the first and second rows of each date
# group, then remove them with drop_duplicates with keep=False to
# drop *all* duplicated rows without keeping first occurrences
res = pd.concat([df, to_del]).drop_duplicates(keep=False)
print(res)
Date Ticker col1 Volume
2 2020-01-01 stock3 - 20
3 2020-01-01 stock4 - 40
4 2020-01-01 stock5 - 43
7 2020-02-01 stock1 - 33
8 2020-02-01 stock2 - 50
9 2020-02-01 stock4 - 52
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.