[英]How to combine columns in pandas pivot table?
我有一個包含 3 級列的 pivot 表。 對於每個唯一的均值和標准,我想將它們組合成一個 str f"{x.mean}({x.std})"
用新的 mean_std_str 列替換均值和標准列。
這是 dataframe 的打印件:
rescore_func asp chemscore ... goldscore plp
tag best first best ... first best first
mean std mean std mean ... std mean std mean std
dock_func ...
asp 65.2 0.7 34.5 2.4 64.0 ... 0.0 64.4 0.7 37.9 0.7
chemscore 59.1 2.0 29.5 2.0 58.0 ... 1.7 58.7 0.7 40.9 2.3
goldscore 68.9 1.7 34.8 4.3 69.7 ... 1.3 68.9 1.3 46.2 0.7
plp 69.3 1.1 35.2 2.0 69.7 ... 2.0 68.9 2.4 39.4 2.9
[4 rows x 16 columns]
所需的 output:
rescore_func asp
tag best first ...
mean_std mean_std ...
dock_func ...
asp 65.2(0.7) 34.5(2.4) ...
chemscore 59.1(2.0) 29.5(2.0) ...
goldscore 68.9(1.7) 34.8(4.3) ...
plp 69.3(1.1) 35.2(2.0) ...
[4 rows x 16 columns]
到目前為止,我有:
df = df.melt(ignore_index=False).reset_index()
df = df.rename(columns=str).rename(columns={'None':'descr'})
這使:
dock_func rescore_func tag descr value
0 asp asp best mean 65.2
1 chemscore asp best mean 59.1
2 goldscore asp best mean 68.9
3 plp asp best mean 69.3
4 asp asp best std 0.7
.. ... ... ... ... ...
59 plp plp first mean 39.4
60 asp plp first std 0.7
61 chemscore plp first std 2.3
62 goldscore plp first std 0.7
63 plp plp first std 2.9
[64 rows x 5 columns]
在重新透視數據之前,我被困在如何將手段和標准組合在一起......
DataFrame.reorder_levels
會讓您輕松。
以下是一些示例數據:
import numpy as np
import pandas as pd
index = pd.Index(["asp", "chemscore", "goldscore", "plp"], name="dock_func")
columns = pd.MultiIndex.from_product(
[index, pd.Index(["best", "fisrt"], name="tag"), ("mean", "std")]
)
df = pd.DataFrame(
np.random.random(size=(4, 16)),
index=index,
columns=columns,
).round(1)
df
看起來像:
dock_func asp chemscore goldscore plp
tag best fisrt best fisrt best fisrt best fisrt
mean std mean std mean std mean std mean std mean std mean std mean std
dock_func
asp 0.5 0.6 0.4 0.2 0.7 0.7 0.8 0.1 0.2 0.5 0.6 0.7 0.5 0.2 0.2 0.7
chemscore 0.0 0.7 0.9 0.2 0.3 0.3 0.4 0.8 0.3 0.4 0.2 0.8 0.5 0.5 0.4 0.2
goldscore 0.5 0.7 0.8 0.0 0.2 0.8 0.1 0.2 0.6 0.1 0.4 0.2 0.8 0.2 0.8 0.3
plp 1.0 0.6 0.6 0.8 0.8 0.6 0.3 1.0 0.7 0.2 0.8 0.2 0.2 0.2 0.7 0.2
然后只需運行以下命令:
df = df.reorder_levels([2, 0, 1], axis=1).astype(str)
df = df["mean"] + "(" + df["std"] + ")"
和df
是:
dock_func asp chemscore goldscore plp
tag best fisrt best fisrt best fisrt best fisrt
dock_func
asp 0.5(0.6) 0.4(0.2) 0.7(0.7) 0.8(0.1) 0.2(0.5) 0.6(0.7) 0.5(0.2) 0.2(0.7)
chemscore 0.0(0.7) 0.9(0.2) 0.3(0.3) 0.4(0.8) 0.3(0.4) 0.2(0.8) 0.5(0.5) 0.4(0.2)
goldscore 0.5(0.7) 0.8(0.0) 0.2(0.8) 0.1(0.2) 0.6(0.1) 0.4(0.2) 0.8(0.2) 0.8(0.3)
plp 1.0(0.6) 0.6(0.8) 0.8(0.6) 0.3(1.0) 0.7(0.2) 0.8(0.2) 0.2(0.2) 0.7(0.2)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.