[英]Convert list of dicts into DataFrame
I am trying to convert the list of dicts that I get from my code here.我正在尝试转换从此处的代码中获得的字典列表。 I tried converting my results to DataFrame and all I get is CatBoost results, the other two models are not recorded.我尝试将我的结果转换为 DataFrame,我得到的只是 CatBoost 结果,其他两个模型没有记录。 The results are a list of dicts that I cannot convert to clean DataFrame.结果是我无法转换为干净的 DataFrame 的字典列表。
models = [('LogisticRegressor', LogisticRegression()),
('RandomForest', RandomForestClassifier()),
('CatBoost', CatBoostClassifier(silent=True))]
dfs=[]
results=[]
names=[]
scoring=['accuracy', 'precision', 'recall', 'f1']
for name, model in models:
cv_results = cross_validate(model, X_train, y_train, scoring=scoring,
cv=5, n_jobs=-1)
clf = model.fit(X_train, y_train)
results.append(cv_results)
names.append(name)
df1=pd.DataFrame(cv_results)
df1['model'] = name
dfs.append(df1)
results
[{'fit_time': array([0.03125048, 0.03125048, 0.03125048, 0.03125048, 0. ]),
'score_time': array([0. , 0.01562595, 0. , 0. , 0.01562405]),
'test_accuracy': array([0.71544715, 0.77235772, 0.73170732, 0.77235772, 0.75409836]),
'test_precision': array([0.59459459, 0.74193548, 0.63888889, 0.75862069, 0.67647059]),
'test_recall': array([0.52380952, 0.53488372, 0.53488372, 0.51162791, 0.54761905]),
'test_f1': array([0.55696203, 0.62162162, 0.58227848, 0.61111111, 0.60526316])},
{'fit_time': array([0.28125215, 0.29687738, 0.29687738, 0.29687738, 0.15625048]),
'score_time': array([0.03125119, 0.03125095, 0.01562595, 0.03125095, 0.01564193]),
'test_accuracy': array([0.72357724, 0.69918699, 0.69918699, 0.77235772, 0.75409836]),
'test_precision': array([0.58695652, 0.58333333, 0.57894737, 0.72727273, 0.6875 ]),
'test_recall': array([0.64285714, 0.48837209, 0.51162791, 0.55813953, 0.52380952]),
'test_f1': array([0.61363636, 0.53164557, 0.54320988, 0.63157895, 0.59459459])},
{'fit_time': array([2.26015663, 2.26015663, 2.26115775, 2.3036592 , 1.60637379]),
'score_time': array([0.00900555, 0.00900555, 0.00900674, 0.00900674, 0.00500464]),
'test_accuracy': array([0.71544715, 0.72357724, 0.71544715, 0.76422764, 0.75409836]),
'test_precision': array([0.57777778, 0.62162162, 0.61111111, 0.71875 , 0.6875 ]),
'test_recall': array([0.61904762, 0.53488372, 0.51162791, 0.53488372, 0.52380952]),
'test_f1': array([0.59770115, 0.575 , 0.55696203, 0.61333333, 0.59459459])}]
In short you can do:简而言之,您可以:
df = (pd.DataFrame(results)
.assign(model=model_names)
.apply(pd.Series.explode))
df
Full reproducible example below with output:下面带有 output 的完整可重现示例:
import pandas as pd
import numpy as np
results = [{'fit_time': np.array([0.03125048, 0.03125048, 0.03125048, 0.03125048, 0. ]),
'score_time': np.array([0. , 0.01562595, 0. , 0. , 0.01562405]),
'test_accuracy': np.array([0.71544715, 0.77235772, 0.73170732, 0.77235772, 0.75409836]),
'test_precision': np.array([0.59459459, 0.74193548, 0.63888889, 0.75862069, 0.67647059]),
'test_recall': np.array([0.52380952, 0.53488372, 0.53488372, 0.51162791, 0.54761905]),
'test_f1': np.array([0.55696203, 0.62162162, 0.58227848, 0.61111111, 0.60526316])},
{'fit_time': np.array([0.28125215, 0.29687738, 0.29687738, 0.29687738, 0.15625048]),
'score_time': np.array([0.03125119, 0.03125095, 0.01562595, 0.03125095, 0.01564193]),
'test_accuracy': np.array([0.72357724, 0.69918699, 0.69918699, 0.77235772, 0.75409836]),
'test_precision': np.array([0.58695652, 0.58333333, 0.57894737, 0.72727273, 0.6875 ]),
'test_recall': np.array([0.64285714, 0.48837209, 0.51162791, 0.55813953, 0.52380952]),
'test_f1': np.array([0.61363636, 0.53164557, 0.54320988, 0.63157895, 0.59459459])},
{'fit_time': np.array([2.26015663, 2.26015663, 2.26115775, 2.3036592 , 1.60637379]),
'score_time': np.array([0.00900555, 0.00900555, 0.00900674, 0.00900674, 0.00500464]),
'test_accuracy': np.array([0.71544715, 0.72357724, 0.71544715, 0.76422764, 0.75409836]),
'test_precision': np.array([0.57777778, 0.62162162, 0.61111111, 0.71875 , 0.6875 ]),
'test_recall': np.array([0.61904762, 0.53488372, 0.51162791, 0.53488372, 0.52380952]),
'test_f1': np.array([0.59770115, 0.575 , 0.55696203, 0.61333333, 0.59459459])}]
model_names = ['LogisticRegressor', 'RandomForest', 'CatBoost']
df = (pd.DataFrame(results)
.assign(model=model_names)
.apply(pd.Series.explode))
df
Out[1]:
fit_time score_time test_accuracy test_precision test_recall test_f1 \
0 0.031250 0.000000 0.715447 0.594595 0.523810 0.556962
0 0.031250 0.015626 0.772358 0.741935 0.534884 0.621622
0 0.031250 0.000000 0.731707 0.638889 0.534884 0.582278
0 0.031250 0.000000 0.772358 0.758621 0.511628 0.611111
0 0.000000 0.015624 0.754098 0.676471 0.547619 0.605263
1 0.281252 0.031251 0.723577 0.586957 0.642857 0.613636
1 0.296877 0.031251 0.699187 0.583333 0.488372 0.531646
1 0.296877 0.015626 0.699187 0.578947 0.511628 0.543210
1 0.296877 0.031251 0.772358 0.727273 0.558140 0.631579
1 0.156250 0.015642 0.754098 0.687500 0.523810 0.594595
2 2.260157 0.009006 0.715447 0.577778 0.619048 0.597701
2 2.260157 0.009006 0.723577 0.621622 0.534884 0.575000
2 2.261158 0.009007 0.715447 0.611111 0.511628 0.556962
2 2.303659 0.009007 0.764228 0.718750 0.534884 0.613333
2 1.606374 0.005005 0.754098 0.687500 0.523810 0.594595
model
0 LogisticRegressor
0 LogisticRegressor
0 LogisticRegressor
0 LogisticRegressor
0 LogisticRegressor
1 RandomForest
1 RandomForest
1 RandomForest
1 RandomForest
1 RandomForest
2 CatBoost
2 CatBoost
2 CatBoost
2 CatBoost
2 CatBoost
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.