[英]Merging a list of Multi-index dataframes
我正在尝试将多个结果数据帧从我用 tqdm process_map 调用的 function 合并在一起。 每个 df 具有一列、一个索引和 3 个子索引。
cost_values =(process_map(run_simulation_a0_b0_search, param_list, max_workers=4))
以下是 dfs 的示例:
0.01
0.01 Collisions 0.0073125
Average distance 3.05586
Minimum distance 0.86763
10.0
0.01 Collisions 0
Average distance 0.423096
Minimum distance 0.332057
0.01
10.0 Collisions 0.00090625
Average distance 0.445388
Minimum distance 0.28061
10.0
10.0 Collisions 0
Average distance 0.418373
Minimum distance 0.29708
我试图连接它们,但这没有用,所以我试图合并它们
从一方面我建议查看 output 数据帧的格式。
它使处理变得混乱和缓慢。
作为 pandas 的经验法则,我总是使用平面数据集 - 1D 或 2D。
无论如何,这里有一个带有最小示例的代码来处理您的数据:
import pandas as pd
from tabulate import tabulate
def replicate_nested_df(df, a, b, columns):
# add nested index
df[''] = a
df = df.set_index([''] + [columns])
# add numeric named column
df = df.rename(columns={0: b})
return df
def flatten_nested_df(df):
# flatten and save simulation parameter a and b from nested structure
b = df.columns.values.tolist()[0]
df = df.reset_index()
a = df.iloc[0, :]['']
# rename and drop columns
df = df.rename(columns={"level_1": "feature"})
df = df.rename(columns={b: "values"})
df = df[["feature", "values"]]
# transpose data
df = df.set_index(["feature"])
df = df.transpose().reset_index(drop=True)
df.rename_axis('', axis=1)
# add simulation parameters
df["a"] = a
df["b"] = b
return df
# create mockup dataframes
columns = ["Collisions", "Average distance", "Minimum distance"]
df1 = pd.DataFrame([[0.0073125, 3.05586, 0.86763]], columns=columns).transpose()
df1 = replicate_nested_df(df1, a=0.01, b=0.01, columns=columns)
df2 = pd.DataFrame([[0.003, 3.2, 0.8]], columns=columns).transpose()
df2 = replicate_nested_df(df2, a=0.01, b=10, columns=columns)
# process each dataframe
df_processed = []
for df_i in [df1, df2]:
df_processed.append(flatten_nested_df(df_i))
# create unique frame
df_concat = pd.concat(df_processed).reset_index(drop=True)
print("Mockup Input:")
print("df1:\n", df1)
print("df2:\n", df2)
print("Processed and merged dataset:")
print(tabulate(df_concat, headers=df_concat.columns, tablefmt='psql'))
输入:
Output:
+----+--------------+--------------------+--------------------+------+-------+
| | Collisions | Average distance | Minimum distance | a | b |
|----+--------------+--------------------+--------------------+------+-------|
| 0 | 0.0073125 | 3.05586 | 0.86763 | 0.01 | 0.01 |
| 1 | 0.003 | 3.2 | 0.8 | 0.01 | 10 |
+----+--------------+--------------------+--------------------+------+-------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.