I am trying to merge 2 dataframes by date index in order. Is this possible?
A sample code of what I need to manipulate
Link for sg_df: https://query1.finance.yahoo.com/v7/finance/download/%5ESTI?P=^STI?period1=1442102400&period2=1599955200&interval=1mo&events=history
Link for facemask_compliance_df: https://today.yougov.com/topics/international/articles-reports/2020/05/18/international-covid-19-tracker-update-18-may (YouGov COVID-19 behaviour changes tracker: Wearing a face mask when in public places)
# Singapore Index
# Read file
# Format Date
# index date column for easy referencing
sg_df = pd.read_csv("^STI.csv")
conv = lambda x: datetime.strptime(x, "%d/%m/%Y")
sg_df["Date"] = sg_df["Date"].apply(conv)
sg_df.sort_values("Date", inplace = True)
sg_df.set_index("Date", inplace = True)
# Will wear face mask in public
# Read file
# Format Date, Removing time
# index date column for easy referencing
facemask_compliance_df = pd.read_csv("yougov-chart.csv")
convert1 = lambda x: datetime.strptime(x, "%d/%m/%Y %H:%M")
facemask_compliance_df["DateTime"] = facemask_compliance_df["DateTime"].apply(convert1).dt.date
facemask_compliance_df.sort_values("DateTime", inplace = True)
facemask_compliance_df.set_index("DateTime", inplace = True)
sg_df = sg_df.merge(facemask_compliance_df["Singapore"], left_index = True, right_index = True, how = "outer").sort_index()
and I wish to output a table kind of like this.
Kindly let me know if you need any more info, I will kindly provide them to you shortly if I am able to.
Edit:
This is the issue
data from yougov-chart
I think it is reading the dates even when it is not from Singapore
Use:
merge
to merge to tables. 1.1. on
to choose on which column to merge:
Column or index level names to join on. These must be found in both DataFrames. If
on
is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.
1.2. outer
option:
outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically.
sort_values
to sort by date import pandas as pd
df1 = pd.read_csv("^STI.csv")
df1['Date'] = pd.to_datetime(df1.Date)
df2 = pd.read_csv("yougov-chart.csv")
df2['Date'] = pd.to_datetime(df2.DateTime)
result = df2.merge(df1, on='Date', how='outer')
result = result.sort_values('Date')
print(result)
Output:
Date US_GDP_Thousands Mask Compliance
6 2016-02-01 NaN 37.0
7 2017-07-01 NaN 73.0
8 2019-10-01 NaN 85.0
0 2020-02-21 50.0 27.0
1 2020-03-18 55.0 NaN
2 2020-03-19 60.0 NaN
3 2020-03-25 65.0 NaN
4 2020-04-03 70.0 NaN
5 2020-05-14 75.0 NaN
First use parameters parse_dates
and index_col
in read_csv
for DatetimeIndex in both and in second remove times by Series.dt.floor
:
sg_df = pd.read_csv("^STI.csv",
parse_dates=['Date'],
index_col=['Date'])
facemask_compliance_df = pd.read_csv("yougov-chart.csv",
parse_dates=['DateTime'],
index_col=['DateTime'])
facemask_compliance_df["DateTime"] = facemask_compliance_df["DateTime"].dt.floor('d')
Then use DataFrame.merge
by index by outer join and then sort index by DataFrame.sort_index
:
df = sg_df.merge(facemask_compliance_df,
left_index=True,
right_index=True,
how='outer').sort_index()
print (df)
Mask Compliance US_GDP_Thousands
Date
2016-02-01 37.0 NaN
2017-07-01 73.0 NaN
2019-10-01 85.0 NaN
2020-02-21 27.0 50.0
2020-03-18 NaN 55.0
2020-03-19 NaN 60.0
2020-03-25 NaN 65.0
2020-04-03 NaN 70.0
2020-05-14 NaN 75.0
If i remember right In numpy you can do v.stack or h.stack. depends on how you want to join them together.
in pandas there was something like concatenate https://pandas.pydata.org/docs/user_guide/merging.html which i used for merging dataframes
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.