[英]Aligning Pandas columns Dataframe by Dates
The dataframe is as follow (3 items...original has hundreds):数据框如下(3个项目......原始有数百个):
Log1 Date2 Log2 Date3 Log3
Date1
01.01.2000 1000 02.01.2000 2000 01.01.2000 3000
02.01.2000 1050 03.01.2000 1950 02.01.2000 3020
03.01.2000 1100 04.01.2000 2000 03.01.2000 3000
The desired outlook, aligning the dates:所需的前景,对齐日期:
Log1 Log2 Log3
Date
01.01.2000 1,000 nan 3,000
02.01.2000 1,050 2,000 3,020
03.01.2000 1,100 1,950 3,000
04.01.2000 nan 2,000 nan
Short example of dataframe:数据框的简短示例:
BBAS3 Data.1 PETR4 Data.2 TRSD Data.3 JKHD
Data
2020-10-05 30.15 2020-10-05 19.91 2020-10-05 30.15 2020-10-05 19.91
2020-10-02 29.71 2020-10-02 19.02 2020-10-02 29.71 2020-10-01 19.85
2020-10-01 29.79 2020-10-01 19.85 2020-10-01 29.79 2020-09-30 19.61
2020-09-30 29.62 2020-09-30 19.61 2020-09-30 29.62 2020-09-29 19.31
2020-09-29 29.76 2020-09-29 19.31 2020-09-29 29.76 2020-09-28 19.63
One idea if input data has DatetimeIndex
loop by unpair and pairs columns names, create Series
and concat
together:如果输入数据通过取消配对和配对列名具有DatetimeIndex
循环,则创建Series
并concat
在一起的一个想法:
#convert Datetimeindex to column
df1 = df.reset_index()
zipped = zip(df1.columns[::2], df1.columns[1::2])
df1 = pd.concat([df1.set_index(a)[b] for a, b in zipped], axis=1)
df1.index = pd.to_datetime(df1.index)
df1 = df1.sort_index()
print (df1)
BBAS3 PETR4 TRSD JKHD
2020-09-28 NaN NaN NaN 19.63
2020-09-29 29.76 19.31 29.76 19.31
2020-09-30 29.62 19.61 29.62 19.61
2020-10-01 29.79 19.85 29.79 19.85
2020-10-02 29.71 19.02 29.71 NaN
2020-10-05 30.15 19.91 30.15 19.91
EDIT:编辑:
#sample data generate error - because duplicated dates in soem column like here in Data
print (df)
BBAS3 Data.1 PETR4 Data.2 TRSD Data.3 JKHD
Data
2020-10-05 200.00 2020-10-05 19.91 2020-10-05 30.15 2020-10-05 19.91
2020-10-05 100.00 2020-10-02 19.02 2020-10-02 29.71 2020-10-01 19.85
2020-10-01 29.79 2020-10-01 19.85 2020-10-01 29.79 2020-09-30 19.61
2020-09-30 29.62 2020-09-30 19.61 2020-09-30 29.62 2020-09-29 19.31
2020-09-29 29.76 2020-09-29 19.31 2020-09-29 29.76 2020-09-28 19.63
df1 = df.reset_index()
zipped = zip(df1.columns[::2], df1.columns[1::2])
df1 = pd.concat([df1.groupby(a)[b].sum() for a, b in zipped], axis=1)
df1.index = pd.to_datetime(df1.index)
df1 = df1.sort_index()
print (df1)
BBAS3 PETR4 TRSD JKHD
2020-09-28 NaN NaN NaN 19.63
2020-09-29 29.76 19.31 29.76 19.31
2020-09-30 29.62 19.61 29.62 19.61
2020-10-01 29.79 19.85 29.79 19.85
2020-10-02 NaN 19.02 29.71 NaN
2020-10-05 300.00 19.91 30.15 19.91
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.