简体   繁体   English

根据等于 pandas dataframe 中的特定值的列定位最小日期?

[英]Locating minimum date based on column equal to specific value in pandas dataframe?

I have a dataframe that looks something like this:我有一个看起来像这样的 dataframe:

       Date     Account             Symbol  Name                                     Transaction type
0   2020-06-24  Vanguard Brokerage  VSGAX   VANGUARD SMALL CAP GROWTH INDEX ADMIRAL CL   Dividend
1   2020-06-24  Vanguard Brokerage  VSGAX   VANGUARD SMALL CAP GROWTH INDEX ADMIRAL CL   Reinvestment
2   2020-06-24  Vanguard Brokerage  VTSAX   VANGUARD TOTAL STOCK MARKET INDEX ADMIRAL C  Dividend
3   2020-06-24  Vanguard Brokerage  VTSAX   VANGUARD TOTAL STOCK MARKET INDEX ADMIRAL    Reinvestment
4   2020-06-19  Vanguard Brokerage  VHYAX   VANGUARD HIGH DIVIDEND YIELD INDEX ADMIRAL   Dividend
5   2020-06-19  Vanguard Brokerage  VHYAX   VANGUARD HIGH DIVIDEND YIELD INDEX ADMIRAL   Reinvestment
7   2020-06-16  Vanguard Brokerage  VHYAX   VANGUARD HIGH DIVIDEND YIELD INDEX ADMIRAL   Buy
8   2020-06-16  Vanguard Brokerage  VSGAX   VANGUARD SMALL CAP GROWTH INDEX ADMIRAL CL   Buy
9   2020-06-16  Vanguard Brokerage  VTSAX   VANGUARD TOTAL STOCK MARKET INDEX ADMIRAL C  Buy

I'd like to pull the earliest date for each symbol that has the transaction type 'buy' and put that info into a dictionary.我想提取具有交易类型“购买”的每个符号的最早日期,并将该信息放入字典中。 I'm not sure if its better to use the.groupby, or if a for-loop is more appropriate.我不确定使用.groupby 是否更好,或者 for 循环是否更合适。

I've currently been trying to use a loop to iterate over all the columns, pull out all transactions that equal 'Buy'.我目前一直在尝试使用循环来遍历所有列,取出所有等于“购买”的交易。 After that, I've been trying to figure out how to pull the minimum date out of that new set of data and put it in a dictionary.在那之后,我一直试图弄清楚如何从新数据集中提取最小日期并将其放入字典中。 Here is what I currently have.这是我目前拥有的。

excel_file_1 = 'Stock.Activity.xlsm'

#Putting excel files into dataframes
df_vang_brok = pd.read_excel(excel_file_1, sheet_name = 'Vanguard.Brokerage',
                             index=False)
df_vang_ira = pd.read_excel(excel_file_1, sheet_name = 'Vanguard.IRA',
                            index=False)
df_schwab_brok = pd.read_excel(excel_file_1, sheet_name = 'Schwab.Brokerage',
                               index=False)

#Combining data frames into one 
df_all = pd.concat([df_vang_brok, df_vang_ira, df_schwab_brok])

df_early={}
for index,row in df_all.iterrows():
    if row['Transaction type'] == 'Buy':
        print(row['Date'],row['Symbol'],row['Amount'])
        df_early = {'Date': row['Date'], 'Symbol': row['Symbol'],
                    'Amount': row['Amount']}
print(df_early)

I get the output:我得到 output:

2017-07-17 00:00:00 VSGAX -678.93
2017-07-05 00:00:00 VTSAX -1915.76
2017-07-03 00:00:00 VTYAX -3022.93
{'Date': Timestamp('2017-07-03 00:00:00'), 'Symbol': 'VTYAX', 'Amount': -3022.93}

It successfully pulls all transactions with "buy" from the dataframe, but how do I pull the earliest date after this and put it in my df_early dataframe?它成功地从 dataframe 中提取了所有带有“购买”的交易,但是我如何在此之后提取最早的日期并将其放入我的 df_early dataframe 中?

Is this even the best/most efficient way to go about this?这甚至是 go 的最佳/最有效方法吗?

Thanks!谢谢!

Something like this?像这样的东西?

df.loc[df['Transaction type'] == 'Buy'].groupby('symbol')['date'].min()

The first part (before.groupby()) selects all rows where 'Transaction type' is 'Buy', then you group that dataframe by 'symbol', select column 'date' and apply the min() function to it.第一部分(tofer.groupby())选择“交易类型”为“买入”的所有行,然后您将Z6A8064B5DF479455555555555555555557DZ组成“符号”,Z99938282582F04040404071859941E8F18F18F116列表和Min.16 colly.16 colly.16 colly and sigrt。 If you want all other columns as well, you can put the above in a separate df.loc[].如果您还想要所有其他列,则可以将上述内容放在单独的 df.loc[] 中。

I am still learning so mayby I am dead wrong, it is hard to try these things but I will play around a bit:)我还在学习,所以也许我错了,很难尝试这些东西,但我会玩一下:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM