简体   繁体   中英

Interpolation of datetime data in Python numpy

I have a.csv consisting of two columns, which I have imported as a numpy array. The first column is datetime data with one piece of data every month. The second column is the corresponding value for that month.

I want to interpolate the data so as to create new datatime rows for every day and also a corresponding value for each day too. If possible, I would also like to introduce some random noise for the interpolated values, but I know this is a lot to ask.

Here is a sample of the data:

Date,Value
01/06/2010 00:00,42.18
01/07/2010 00:00,43.53
01/08/2010 00:00,39.95
01/09/2010 00:00,41.12
01/10/2010 00:00,43.5
01/11/2010 00:00,46.4
01/12/2010 00:00,58.03
01/01/2011 00:00,48.43
01/02/2011 00:00,46.47
01/03/2011 00:00,51.41
01/04/2011 00:00,50.88
01/05/2011 00:00,50.27
01/06/2011 00:00,50.82

Thanks very much for your help - I know of scipy.interpolate , but not sure if this can work with datetime format or not?

Assuming your Date column is sorted and contains string (not date) values and your Values column contains floats, this is a way to get the interpolated value for every day between the first and last date, assuming DD/MM/YYYY format:

import datetime as dt

df[["Date", "Time"]] = df["Date"].str.split(' ', expand=True)
df[["Day", "Month", "Year"]] = df["Date"].str.split('/', expand=True)

first_date = np.array([int(df["Day"].iloc[0]), int(df["Month"].iloc[0]), int(df["Year"].iloc[0])]).flatten()

# I'm trying to get the number of days between date entries so I can turn each date
# into a float with the number being how many days since the 1st day.

col1 = df["Date"].iloc[0:len(df) - 1]
col2 = df["Date"].iloc[1:]

col1 = pd.to_datetime(col1, format='%d/%m/%Y').reset_index()
col2 = pd.to_datetime(col2, format='%d/%m/%Y').reset_index()

# Finding the difference and adding a row at the beginning with 0 days because
# diff is 1 row short; it does not have a value for the 1st date, which should be
# 0 days since the 1st date.

diff = col2 - col1
diff = diff["Date"].dt.days.cumsum()
diff = pd.concat([pd.DataFrame([0]), diff], axis=0).reset_index().drop(["index"], axis=1)

# Original_x are the dates in float format.
original_x = diff.to_numpy().flatten()
final_x_vals = np.arange(0, original_x[-1] + 1, 1)
original_y = df["Value"].to_numpy().astype(float)

final_y_vals = np.interp(final_x_vals, original_x, original_y)

# Function to turn the final_x_vals (i.e. interpolated dates) back to dates.
def num_to_date(nums, first_date):
  first_day, first_month, first_year = first_date
  first_date = dt.datetime(first_year, first_month, first_day, 0,0)
  
  dates = []
  for n in nums:
    new_date = first_date + dt.timedelta(days = int(n))
    dates.append(new_date)

  return dates

final_dates = num_to_date(final_x_vals, first_date)

# df with interpolated values.
new_df = pd.DataFrame(list(map(list, zip(*[final_dates, final_y_vals]))), columns=["Date", "Value"])

It's very cumbersome and I'm sure there's a more efficient way, but it serves the purpose. Let me know if you have any questions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM