Interpolation of datetime data in Python numpy

Question

I have a.csv consisting of two columns, which I have imported as a numpy array. The first column is datetime data with one piece of data every month. The second column is the corresponding value for that month.

I want to interpolate the data so as to create new datatime rows for every day and also a corresponding value for each day too. If possible, I would also like to introduce some random noise for the interpolated values, but I know this is a lot to ask.

Here is a sample of the data:

Date,Value
01/06/2010 00:00,42.18
01/07/2010 00:00,43.53
01/08/2010 00:00,39.95
01/09/2010 00:00,41.12
01/10/2010 00:00,43.5
01/11/2010 00:00,46.4
01/12/2010 00:00,58.03
01/01/2011 00:00,48.43
01/02/2011 00:00,46.47
01/03/2011 00:00,51.41
01/04/2011 00:00,50.88
01/05/2011 00:00,50.27
01/06/2011 00:00,50.82

Thanks very much for your help - I know of scipy.interpolate , but not sure if this can work with datetime format or not?

Answer 1

Assuming your Date column is sorted and contains string (not date) values and your Values column contains floats, this is a way to get the interpolated value for every day between the first and last date, assuming DD/MM/YYYY format:

import datetime as dt

df[["Date", "Time"]] = df["Date"].str.split(' ', expand=True)
df[["Day", "Month", "Year"]] = df["Date"].str.split('/', expand=True)

first_date = np.array([int(df["Day"].iloc[0]), int(df["Month"].iloc[0]), int(df["Year"].iloc[0])]).flatten()

# I'm trying to get the number of days between date entries so I can turn each date
# into a float with the number being how many days since the 1st day.

col1 = df["Date"].iloc[0:len(df) - 1]
col2 = df["Date"].iloc[1:]

col1 = pd.to_datetime(col1, format='%d/%m/%Y').reset_index()
col2 = pd.to_datetime(col2, format='%d/%m/%Y').reset_index()

# Finding the difference and adding a row at the beginning with 0 days because
# diff is 1 row short; it does not have a value for the 1st date, which should be
# 0 days since the 1st date.

diff = col2 - col1
diff = diff["Date"].dt.days.cumsum()
diff = pd.concat([pd.DataFrame([0]), diff], axis=0).reset_index().drop(["index"], axis=1)

# Original_x are the dates in float format.
original_x = diff.to_numpy().flatten()
final_x_vals = np.arange(0, original_x[-1] + 1, 1)
original_y = df["Value"].to_numpy().astype(float)

final_y_vals = np.interp(final_x_vals, original_x, original_y)

# Function to turn the final_x_vals (i.e. interpolated dates) back to dates.
def num_to_date(nums, first_date):
  first_day, first_month, first_year = first_date
  first_date = dt.datetime(first_year, first_month, first_day, 0,0)
  
  dates = []
  for n in nums:
    new_date = first_date + dt.timedelta(days = int(n))
    dates.append(new_date)

  return dates

final_dates = num_to_date(final_x_vals, first_date)

# df with interpolated values.
new_df = pd.DataFrame(list(map(list, zip(*[final_dates, final_y_vals]))), columns=["Date", "Value"])

It's very cumbersome and I'm sure there's a more efficient way, but it serves the purpose. Let me know if you have any questions.

Interpolation of datetime data in Python numpy

Question

1 answers

solution1
0 2022-04-10 19:55:54

Interpolation of datetime data in Python numpy

Question

1 answers

solution1 0 2022-04-10 19:55:54

solution1
0 2022-04-10 19:55:54