简体   繁体   中英

How to put two pandas dataframes together, with one dataframe missing dates

I have historical ticker data in one dataframe, and historical dividend data in another dataframe. They look like this:

        Date        High        Low         Open        Close       Volume      Adj Close
0       1993-01-29  43.968750   43.750000   43.968750   43.937500   1003200.0   26.453930
1       1993-02-01  44.250000   43.968750   43.968750   44.250000   480500.0    26.642057
2       1993-02-02  44.375000   44.125000   44.218750   44.343750   201300.0    26.698507
3       1993-02-03  44.843750   44.375000   44.406250   44.812500   529400.0    26.980742
4       1993-02-04  45.093750   44.468750   44.968750   45.000000   531500.0    27.093624
... ... ... ... ... ... ... ...
6809    2020-02-13  338.119995  335.559998  335.859985  337.059998  54501900.0  337.059998
6810    2020-02-14  337.730011  336.200012  337.510010  337.600006  64582200.0  337.600006
6811    2020-02-18  337.670013  335.209991  336.510010  336.730011  57226200.0  336.730011
6812    2020-02-19  339.079987  337.480011  337.790009  338.339996  48639100.0  338.339996
6813    2020-02-20  338.640015  333.681702  337.742004  336.894989  63193703.0  336.894989
            action      value
2019-12-20  DIVIDEND    1.570
2019-09-20  DIVIDEND    1.384
2019-06-21  DIVIDEND    1.432
2019-03-15  DIVIDEND    1.233
2018-12-21  DIVIDEND    1.435
... ... ...
1994-03-18  DIVIDEND    0.271
1993-12-17  DIVIDEND    0.317
1993-09-17  DIVIDEND    0.286
1993-06-18  DIVIDEND    0.318
1993-03-19  DIVIDEND    0.213

I want something like this:

        Date        High        Low         Open        Close       Volume      Adj Close       Dividend
0       1993-01-29  43.968750   43.750000   43.968750   43.937500   1003200.0   26.453930           
1       1993-02-01  44.250000   43.968750   43.968750   44.250000   480500.0    26.642057           
2       1993-02-02  44.375000   44.125000   44.218750   44.343750   201300.0    26.698507           
3       1993-02-03  44.843750   44.375000   44.406250   44.812500   529400.0    26.980742           
4       1993-02-04  45.093750   44.468750   44.968750   45.000000   531500.0    27.093624           
... ... ... ... ... ... ... ...
6809    2020-02-13  338.119995  335.559998  335.859985  337.059998  54501900.0  337.059998          1.570
6810    2020-02-14  337.730011  336.200012  337.510010  337.600006  64582200.0  337.600006          1.570
6811    2020-02-18  337.670013  335.209991  336.510010  336.730011  57226200.0  336.730011          1.570
6812    2020-02-19  339.079987  337.480011  337.790009  338.339996  48639100.0  338.339996          1.570
6813    2020-02-20  338.640015  333.681702  337.742004  336.894989  63193703.0  336.894989          1.570

(I have no dividend data before 1993-03-19, but I do onward.)

Basically, I want to merge them so that the dividend column is continuous with the dates in the historical dataframe.

You can join, then fill missing values and finally rename columns.

Supose your dataframes are named market_prices and dividend_data .

You can do this:

import pandas as pd

# Make sure the keys share type
market_prices['Date'] = pd.to_datetime(market_prices['Date'])
dividend_data.index = pd.to_datetime(dividend_data.index)

# Join the data
prices_with_dividends = market_prices.join(
    dividend_data['value'], how='left', on='Date'
)
# Fill missing values with previous data
prices_with_dividends['value'].fillna(method='ffill', inplace=True)
# Rename column
prices_with_dividends.rename({'value': 'Dividend'}, axis=1, inplace=True)

That should do.

Assuming df is the data frame with historical ticker data, and df2 is the one with dividend information.

#prerequisite:
df["Date"]=pd.to_datetime(df["Date"])

df=df.set_index("Date")
df["Dividend"]=df2["value"]
df=df.reset_index()

Interpolated from @fjsh's answer.

# Join the data
prices_with_dividends = features.set_index('Date').join(dividend_data.set_index('Date'))

# Fill missing values with previous data
prices_with_dividends['Value'].fillna(method='ffill', inplace=True)

# Rename column
prices_with_dividends.rename({'Value': 'Dividend'}, axis=1, inplace=True)
price_table = price_table.merge(dividend_table, left_on = 'Date', right_on ='Date',how = 'outer')
price.value = price.value.fillna('')

If the dates are not in the same format convert using:

price_table['Date'] = pd.to_datetime(price_table['Date'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM