[英]how to append two columns in a dataframe side by side in pandas
我有一个名为df_final
的数据框输入,如下所示:
location 2022-01 2022-02 2022-03 2022-01 2022-02 2022-03
AGH 2062.25 6561.43 3154.43 0 0 0
AIM 3122.74 3277.56 3631.45 0 0 0
AJL 1641.82 3229.49 4937.23 0 0 0
.................
第一批月份(前三列月份)实际上是总成本,而后三列实际上是总CN。 我生成此输出的代码如下所示:
# Get monthly delivery CN
total_delivery_CN = monthly_delivery_CN_by_branch(start_date, end_date)
# Get monthly pickup CN
total_pickup_CN = monthly_pickup_CN_by_branch(start_date, end_date)
# Get monthly maintainence cost:
total_maintainence_cost = monthly_maintainence_cost_by_branch(start_date, end_date)
# Get monthly refuel cost:
total_refuel_cost = monthly_refuel_cost_by_branch(start_date, end_date)
# Union
df = pd.concat([total_delivery_CN, total_pickup_CN, total_maintainence_cost, total_refuel_cost])
print('Union')
print(df)
df.to_csv('df.csv')
# Pivot
df_final = df.pivot_table(index=['location'], columns = ['report_month'], aggfunc = {'total_CN' : np.sum, 'Cost' : np.sum} , fill_value=0)
df_final.columns = df_final.columns.to_flat_index().to_series().apply(lambda x: x[1])
df_final = df_final.reset_index()
df_final.to_csv('df_final.csv')
print(df_final)
但是,我会在同一个月合并两个总数,因此最终输出如下所示:
2022-01 2022-02 2022-03
location Cost total_CN Cost total_CN Cost total_CN
AGH 2062.25 0 6561.43 0 3154.43 0
AIM 3122.74 0 3277.56 0 3631.45 0
AJL 1641.82 0 3229.49 0 4937.23 0
.................
我如何在同一个月下追加这两个总数?
我从数据框开始:
df = pd.DataFrame(
[
[2,3,4,0.2,0.3,0.4],
[2,3,4,0.2,0.3,0.4],
[2,3,4,0.2,0.3,0.4]
],
index=('AGH', 'AIM', 'AJL'),
columns=('2022-01', '2022-02', '2022-03', '2022-01', '2022-02', '2022-03')
).rename_axis('location')
看起来像
2022-01 2022-02 2022-03 2022-01 2022-02 2022-03
location
AGH 2 3 4 0.2 0.3 0.4
AIM 2 3 4 0.2 0.3 0.4
AJL 2 3 4 0.2 0.3 0.4
这就是你要找的:
def flatten(l):
return [item for sublist in l for item in sublist]
def create_multiindex(cols):
return pd.MultiIndex.from_tuples(
flatten([[(char, 'Cost'), (char, 'total_CN')] for char in sorted(set(cols))])
)
df = df[sorted(set(df.columns))]
df.columns = create_multiindex(df.columns)
它产生:
2022-01 2022-02 2022-03
Cost total_CN Cost total_CN Cost total_CN
location
AGH 2 0.2 3 0.3 4 0.4
AIM 2 0.2 3 0.3 4 0.4
AJL 2 0.2 3 0.3 4 0.4
您可以将 df_final 分割成两个数据帧,然后您可以像我们在此处所做的那样将两个帧连接起来:
df_cost = df_final.iloc[:, 0:3]
df_total_cn = df_final.iloc[:, 3:6]
df = pd.concat([df_cost, df_total_cn], axis=1, keys=['cost', 'total cn']).swaplevel(axis=1).sort_index('columns').fillna(0)
print(df)
2022-01 2022-02 2022-03
cost total cn cost total cn cost total cn
location
AGH 2062.25 0 6561.43 0 3154.43 0
AIM 3122.74 0 3277.56 0 3631.45 0
AJL 1641.82 0 3229.49 0 4937.23 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.