[英]how to simulate pandas dataframe data with increment datetime
In python3 and pandas:在 python3 和 pandas 中:
assuming i have a dataframe:假设我有一个 dataframe:
datetime,id,value
2020-03-12,1,100
2020-03-13,1,105
2020-03-14,1,110
2020-03-12,2,100
2020-03-13,2,105
2020-03-14,2,110
I am trying to simulate these datasets with x extra historical days.我正在尝试用 x 个额外的历史天数来模拟这些数据集。
Let us say x=2 for now, and we wont add any new ID.让我们现在说 x=2,我们不会添加任何新 ID。 Just existing IDs in the datasets.只是数据集中现有的 ID。 The value column can be incremental or random.值列可以是增量的或随机的。 Wonder how could I do it?想知道我该怎么做?
The first thing we have to is to extend the time:我们要做的第一件事就是延长时间:
df2=pd.DataFrame(pd.date_range(pd.to_datetime('today'), periods=10, freq='1440min'))
df['datetime']=df['datetime'].append(df2)
then i got然后我得到了
ValueError: cannot reindex from a duplicate axis
Wonder how could I do it?想知道我该怎么做?
one way could be to set_index
the datetime and id columns, then reindex
with all the dates you want generated through date_range
using pd.MultiIndex.from_product
and finally reset_index
to put them back as columns like:一种方法可能是set_index
datetime 和 id 列,然后使用pd.MultiIndex.from_product
使用您希望通过date_range
生成的所有日期重新reindex
,最后reset_index
将它们作为列放回,例如:
#ensure datetime is good format
df['datetime'] = pd.to_datetime(df['datetime'])
#set parameter for extra days
x=2
df_re = df.set_index(['id', 'datetime'])\
.reindex(pd.MultiIndex.from_product([df['id'].unique(),
pd.date_range(df['datetime'].min(),
df['datetime'].max() + pd.Timedelta(days=x))],
names=['id', 'datetime']),
fill_value=120)\
.reset_index()
print (df_re)
id datetime value
0 1 2020-03-12 100
1 1 2020-03-13 105
2 1 2020-03-14 110
3 1 2020-03-15 120
4 1 2020-03-16 120
5 2 2020-03-12 100
6 2 2020-03-13 105
7 2 2020-03-14 110
8 2 2020-03-15 120
9 2 2020-03-16 120
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.