简体   繁体   中英

Any way to correctly merge two time series with different dims in pandas?

I intend to join two time series with different dimensions in pandas. first time series is about covid19 daily case data, while second time series is daily cut statistics of food processing plants, then I want to join merged dataframe with another data by its common column. First of, I want to join them by date with certain specification. in covid case time series, data is recorded based on county level, while in daily cut time series, there is daily cut statistics which can be either average daily cut time series for each county or it is uniformly distributed. To make joining these two time series more logical, I did some aggregation and tried to join but it is not working as I expected. Can anyone suggest possible way of making this happen in pandas? Any idea?

current attempt & reproducible data

Here is covid time series data in gist came from NYT covid19 data and daily cut time series from food processing agency. Here is my current attempt:

import pandas as pd

df1= pd.read_csv("us_covid_by_counties.csv")
df1 = df1.drop(columns=['Unnamed: 0'], inplace=True) 

df2= pd.read_csv("daily_cut.csv")
df2 = df2.drop(columns=['Unnamed: 0'], inplace=True)

## process and aggregate covid time series
ctyList = list(df1['county'].unique())
df1_new= {}
for c in ctyList:
    cty_df = df1[df1['county']==c]
    cty_df['new_cases'] = cty_df['cases'].diff()
    cty_df['new_deaths'] = cty_df['deaths'].diff()
    df1_new[c] = cty_df

df1_new = pd.DataFrame.from_dict(df1_new, orient='index')

then, I tried to merge them in this way:

df_merged = pd.concat([df1_new , df2]).sort_values('date').reset_index(drop=True)

update :

if merging df1_new and df2 can be done correctly, I want to join again df_merged with this data by county_state . Is there any way to get this right in pandas?

but I have hard time to correctly join these two time series. Can anyone suggest any possible idea to make this work? Any possible thoughts?

To complete @XXavier 's suggestion in the comments:

Make sure you import dates correctly:

df1 = pd.read_csv('data/us_covid_by_counties.csv', parse_dates=['date']).drop(columns=['Unnamed: 0'])
df2 = pd.read_csv('data/daily_cut.csv', parse_dates=['date']).drop(columns=['Unnamed: 0'])

Add the columns you want:

df1['new_cases'] = df1.groupby(['county'])['cases'].diff()
df1['new_deaths'] = df1.groupby(['county'])['deaths'].diff()

Create the merged df:

df_merged = pd.merge_asof(df1, df2, on="date", direction='nearest')

In your original question you mentioned two dataframes. In your comment you mentioned another dataframe. Is this a different question? The merge_asof works for your original dataset. Please see below

这是第二个数据框

在此处输入图片说明

This is to change the datatype to datetime

df1['date'] = pd.to_datetime(df1['date'])
df2['date'] = pd.to_datetime(df2['date'])

Here is the output i got

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM