I have a dataframe of boolean variables, idexed by timestamps. The timestamps are irregular and I wish to fill in the gaps. I know that the frequency needed is 3ms.
So far, I can do the following :
df = pd.read_csv(path, sep= ';')
df['timestamp'] = pd.to_datetime(df ['timestamp'], errors='raise',infer_datetime_format = True)
df = df.sort(['timestamp'])
df = df.set_index('timestamp')
df.reindex(pd.period_range(df.index[0], df.index[-1], freq='ms'))
df = df.fillna(method = 'ffill')
So, I am reindexing using a ms interval and filling forward missing values (which is what fits my case : all variables are boolean, so at each moment, the current state is the last appearing in my data).
How can I resample every 3 milliseconds?
EDIT : It seems like DataFrame.resample can also be used for upsampling. Any suggestions on how to use it in my case ? I do not seem to get how it works.
Use DataFrame.asfreq
:
df = pd.DataFrame({
'timestamp': pd.to_datetime(['2015-02-01 15:14:11.30',
'2015-02-01 15:14:11.36',
'2015-02-01 15:14:11.39']),
'B': [7,10,3]
})
print (df)
timestamp B
0 2015-02-01 15:14:11.300 7
1 2015-02-01 15:14:11.360 10
2 2015-02-01 15:14:11.390 3
df = df.set_index('timestamp').asfreq('3ms', method='ffill')
print (df)
B
timestamp
2015-02-01 15:14:11.300 7
2015-02-01 15:14:11.303 7
2015-02-01 15:14:11.306 7
2015-02-01 15:14:11.309 7
2015-02-01 15:14:11.312 7
2015-02-01 15:14:11.315 7
2015-02-01 15:14:11.318 7
2015-02-01 15:14:11.321 7
2015-02-01 15:14:11.324 7
2015-02-01 15:14:11.327 7
2015-02-01 15:14:11.330 7
2015-02-01 15:14:11.333 7
2015-02-01 15:14:11.336 7
2015-02-01 15:14:11.339 7
2015-02-01 15:14:11.342 7
2015-02-01 15:14:11.345 7
2015-02-01 15:14:11.348 7
2015-02-01 15:14:11.351 7
2015-02-01 15:14:11.354 7
2015-02-01 15:14:11.357 7
2015-02-01 15:14:11.360 10
2015-02-01 15:14:11.363 10
2015-02-01 15:14:11.366 10
2015-02-01 15:14:11.369 10
2015-02-01 15:14:11.372 10
2015-02-01 15:14:11.375 10
2015-02-01 15:14:11.378 10
2015-02-01 15:14:11.381 10
2015-02-01 15:14:11.384 10
2015-02-01 15:14:11.387 10
2015-02-01 15:14:11.390 3
if you have your timestamp in index:
df = df.resample('3ms').ffill()
EDIT:
performance benchmark
import time
import pandas as pd
dd = {'dt': ['2018-01-01 00:00:00', '2018-01-01 01:12:59'], 'v':[1,1]}
df = pd.DataFrame(data=dd)
df['dt'] = pd.to_datetime(df['dt'])
df = df.set_index('dt')
start = time.time()
df = df.resample('3ms').ffill()
print(time.time() - start)
df = pd.DataFrame(data=dd)
df['dt'] = pd.to_datetime(df['dt'])
df = df.set_index('dt')
start = time.time()
df = df.asfreq('3ms', method='ffill')
print(time.time() - start)
print(df.shape)
result:
0.03699994087219238
0.029999732971191406
(1459667, 1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.