简体   繁体   中英

Locating minimum date based on column equal to specific value in pandas dataframe?

I have a dataframe that looks something like this:

       Date     Account             Symbol  Name                                     Transaction type
0   2020-06-24  Vanguard Brokerage  VSGAX   VANGUARD SMALL CAP GROWTH INDEX ADMIRAL CL   Dividend
1   2020-06-24  Vanguard Brokerage  VSGAX   VANGUARD SMALL CAP GROWTH INDEX ADMIRAL CL   Reinvestment
2   2020-06-24  Vanguard Brokerage  VTSAX   VANGUARD TOTAL STOCK MARKET INDEX ADMIRAL C  Dividend
3   2020-06-24  Vanguard Brokerage  VTSAX   VANGUARD TOTAL STOCK MARKET INDEX ADMIRAL    Reinvestment
4   2020-06-19  Vanguard Brokerage  VHYAX   VANGUARD HIGH DIVIDEND YIELD INDEX ADMIRAL   Dividend
5   2020-06-19  Vanguard Brokerage  VHYAX   VANGUARD HIGH DIVIDEND YIELD INDEX ADMIRAL   Reinvestment
7   2020-06-16  Vanguard Brokerage  VHYAX   VANGUARD HIGH DIVIDEND YIELD INDEX ADMIRAL   Buy
8   2020-06-16  Vanguard Brokerage  VSGAX   VANGUARD SMALL CAP GROWTH INDEX ADMIRAL CL   Buy
9   2020-06-16  Vanguard Brokerage  VTSAX   VANGUARD TOTAL STOCK MARKET INDEX ADMIRAL C  Buy

I'd like to pull the earliest date for each symbol that has the transaction type 'buy' and put that info into a dictionary. I'm not sure if its better to use the.groupby, or if a for-loop is more appropriate.

I've currently been trying to use a loop to iterate over all the columns, pull out all transactions that equal 'Buy'. After that, I've been trying to figure out how to pull the minimum date out of that new set of data and put it in a dictionary. Here is what I currently have.

excel_file_1 = 'Stock.Activity.xlsm'

#Putting excel files into dataframes
df_vang_brok = pd.read_excel(excel_file_1, sheet_name = 'Vanguard.Brokerage',
                             index=False)
df_vang_ira = pd.read_excel(excel_file_1, sheet_name = 'Vanguard.IRA',
                            index=False)
df_schwab_brok = pd.read_excel(excel_file_1, sheet_name = 'Schwab.Brokerage',
                               index=False)

#Combining data frames into one 
df_all = pd.concat([df_vang_brok, df_vang_ira, df_schwab_brok])

df_early={}
for index,row in df_all.iterrows():
    if row['Transaction type'] == 'Buy':
        print(row['Date'],row['Symbol'],row['Amount'])
        df_early = {'Date': row['Date'], 'Symbol': row['Symbol'],
                    'Amount': row['Amount']}
print(df_early)

I get the output:

2017-07-17 00:00:00 VSGAX -678.93
2017-07-05 00:00:00 VTSAX -1915.76
2017-07-03 00:00:00 VTYAX -3022.93
{'Date': Timestamp('2017-07-03 00:00:00'), 'Symbol': 'VTYAX', 'Amount': -3022.93}

It successfully pulls all transactions with "buy" from the dataframe, but how do I pull the earliest date after this and put it in my df_early dataframe?

Is this even the best/most efficient way to go about this?

Thanks!

Something like this?

df.loc[df['Transaction type'] == 'Buy'].groupby('symbol')['date'].min()

The first part (before.groupby()) selects all rows where 'Transaction type' is 'Buy', then you group that dataframe by 'symbol', select column 'date' and apply the min() function to it. If you want all other columns as well, you can put the above in a separate df.loc[].

I am still learning so mayby I am dead wrong, it is hard to try these things but I will play around a bit:)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM