[英]Inserting dates into large dataframe
I have df
with 3 columns - a
, b
, dt
.我有
df
3 列 - a
, b
, dt
。 I want to insert rows in this df
such that where there are two similar values of a
and b
all the dates in between are inserted with values of a
and b
repeated.我想在这个
df
中插入行,以便在有两个相似的a
和b
值的情况下,插入其中的所有日期,并重复a
和b
的值。
>>> import pandas as pd
>>> from datetime import datetime as dt
>>> df = pd.DataFrame({'a':['abd', 'abd', 'rds', 'rds', 'rsd', 'rsd', 'tsb'], 'b':['ar','ar','pr','pr','sg','sg','sg'], 'dt':[dt(2013,1,1), dt(2013,1,4), dt(2014,7,3), dt(2014,7,14), dt(2016,4,8), dt(2016,4,9), dt(2016,4,9)]})
>>> df
a b dt
0 abd ar 2013-01-01
1 abd ar 2013-01-04
2 rds pr 2014-07-03
3 rds pr 2014-07-14
4 rsd sg 2016-04-08
5 rsd sg 2016-04-09
6 tsb sg 2016-04-09
>>>
The desired output df
is as follows:所需的 output
df
如下:
>>> df
a b dt
0 abd ar 2013-01-01
1 abd ar 2013-01-02
2 abd ar 2013-01-03
3 abd ar 2013-01-04
4 rds pr 2014-07-03
5 rds pr 2014-07-04
6 rds pr 2014-07-05
7 rds pr 2014-07-06
8 rds pr 2014-07-07
9 rds pr 2014-07-08
10 rds pr 2014-07-09
11 rds pr 2014-07-10
12 rds pr 2014-07-11
13 rds pr 2014-07-12
14 rds pr 2014-07-13
15 rds pr 2014-07-14
16 rsd sg 2016-04-08
17 rsd sg 2016-04-09
18 tsb sg 2016-04-09
>>>
This is a groupBy and resample operation.这是一个 groupBy 和 resample 操作。 Try:
尝试:
(df.set_index('dt')
.groupby(['a', 'b'], group_keys=False, as_index=False)
.resample('D')
.ffill()
.reset_index())
dt a b
0 2013-01-01 abd ar
1 2013-01-02 abd ar
2 2013-01-03 abd ar
3 2013-01-04 abd ar
4 2014-07-03 rds pr
5 2014-07-04 rds pr
6 2014-07-05 rds pr
7 2014-07-06 rds pr
8 2014-07-07 rds pr
9 2014-07-08 rds pr
10 2014-07-09 rds pr
11 2014-07-10 rds pr
12 2014-07-11 rds pr
13 2014-07-12 rds pr
14 2014-07-13 rds pr
15 2014-07-14 rds pr
16 2016-04-08 rsd sg
17 2016-04-09 rsd sg
18 2016-04-09 tsb sg
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.