简体   繁体   中英

Python (Pandas) How to merge 2 dataframes with different dates in incremental order?

I am trying to merge 2 dataframes by date index in order. Is this possible?

A sample code of what I need to manipulate

Link for sg_df: https://query1.finance.yahoo.com/v7/finance/download/%5ESTI?P=^STI?period1=1442102400&period2=1599955200&interval=1mo&events=history

Link for facemask_compliance_df: https://today.yougov.com/topics/international/articles-reports/2020/05/18/international-covid-19-tracker-update-18-may (YouGov COVID-19 behaviour changes tracker: Wearing a face mask when in public places)

# Singapore Index
# Read file
# Format Date
# index date column for easy referencing
sg_df = pd.read_csv("^STI.csv")
conv = lambda x: datetime.strptime(x, "%d/%m/%Y")
sg_df["Date"] = sg_df["Date"].apply(conv)
sg_df.sort_values("Date", inplace = True)
sg_df.set_index("Date", inplace = True)

# Will wear face mask in public
# Read file
# Format Date, Removing time
# index date column for easy referencing
facemask_compliance_df = pd.read_csv("yougov-chart.csv")
convert1 = lambda x: datetime.strptime(x, "%d/%m/%Y %H:%M") 
facemask_compliance_df["DateTime"] = facemask_compliance_df["DateTime"].apply(convert1).dt.date
facemask_compliance_df.sort_values("DateTime", inplace = True)
facemask_compliance_df.set_index("DateTime", inplace = True)

sg_df = sg_df.merge(facemask_compliance_df["Singapore"], left_index = True, right_index = True, how = "outer").sort_index()

and I wish to output a table kind of like this.

在此处输入图片说明

Kindly let me know if you need any more info, I will kindly provide them to you shortly if I am able to.

Edit:

This is the issue

在此处输入图片说明

data from yougov-chart

在此处输入图片说明

I think it is reading the dates even when it is not from Singapore

Use:

  1. merge to merge to tables.

1.1. on to choose on which column to merge:

Column or index level names to join on. These must be found in both DataFrames. If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.

1.2. outer option:

outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.

  1. sort_values to sort by date
import pandas as pd

df1 = pd.read_csv("^STI.csv")

df1['Date'] = pd.to_datetime(df1.Date)

df2 = pd.read_csv("yougov-chart.csv")

df2['Date'] = pd.to_datetime(df2.DateTime)

result = df2.merge(df1, on='Date', how='outer')
result = result.sort_values('Date')

print(result)

Output:

        Date  US_GDP_Thousands  Mask Compliance
6 2016-02-01               NaN             37.0
7 2017-07-01               NaN             73.0
8 2019-10-01               NaN             85.0
0 2020-02-21              50.0             27.0
1 2020-03-18              55.0              NaN
2 2020-03-19              60.0              NaN
3 2020-03-25              65.0              NaN
4 2020-04-03              70.0              NaN
5 2020-05-14              75.0              NaN

First use parameters parse_dates and index_col in read_csv for DatetimeIndex in both and in second remove times by Series.dt.floor :

sg_df = pd.read_csv("^STI.csv", 
                    parse_dates=['Date'], 
                    index_col=['Date'])

facemask_compliance_df = pd.read_csv("yougov-chart.csv", 
                                     parse_dates=['DateTime'],
                                     index_col=['DateTime'])
facemask_compliance_df["DateTime"] = facemask_compliance_df["DateTime"].dt.floor('d')

Then use DataFrame.merge by index by outer join and then sort index by DataFrame.sort_index :

df = sg_df.merge(facemask_compliance_df, 
                 left_index=True, 
                 right_index=True, 
                 how='outer').sort_index()
print (df)
            Mask Compliance  US_GDP_Thousands
Date                                         
2016-02-01             37.0               NaN
2017-07-01             73.0               NaN
2019-10-01             85.0               NaN
2020-02-21             27.0              50.0
2020-03-18              NaN              55.0
2020-03-19              NaN              60.0
2020-03-25              NaN              65.0
2020-04-03              NaN              70.0
2020-05-14              NaN              75.0

If i remember right In numpy you can do v.stack or h.stack. depends on how you want to join them together.

in pandas there was something like concatenate https://pandas.pydata.org/docs/user_guide/merging.html which i used for merging dataframes

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM