將字典列表轉換為 DataFrame

Question

我正在嘗試轉換從此處的代碼中獲得的字典列表。 我嘗試將我的結果轉換為 DataFrame，我得到的只是 CatBoost 結果，其他兩個模型沒有記錄。 結果是我無法轉換為干凈的 DataFrame 的字典列表。

models = [('LogisticRegressor', LogisticRegression()),
         ('RandomForest', RandomForestClassifier()),
         ('CatBoost', CatBoostClassifier(silent=True))]
dfs=[]
results=[]
names=[]
scoring=['accuracy', 'precision', 'recall', 'f1']

for name, model in models:
    cv_results = cross_validate(model, X_train, y_train, scoring=scoring,
                               cv=5, n_jobs=-1)
    clf = model.fit(X_train, y_train)
    results.append(cv_results)
    names.append(name)
df1=pd.DataFrame(cv_results)
df1['model'] = name
dfs.append(df1)

results
[{'fit_time': array([0.03125048, 0.03125048, 0.03125048, 0.03125048, 0.        ]),
  'score_time': array([0.        , 0.01562595, 0.        , 0.        , 0.01562405]),
  'test_accuracy': array([0.71544715, 0.77235772, 0.73170732, 0.77235772, 0.75409836]),
  'test_precision': array([0.59459459, 0.74193548, 0.63888889, 0.75862069, 0.67647059]),
  'test_recall': array([0.52380952, 0.53488372, 0.53488372, 0.51162791, 0.54761905]),
  'test_f1': array([0.55696203, 0.62162162, 0.58227848, 0.61111111, 0.60526316])},
 {'fit_time': array([0.28125215, 0.29687738, 0.29687738, 0.29687738, 0.15625048]),
  'score_time': array([0.03125119, 0.03125095, 0.01562595, 0.03125095, 0.01564193]),
  'test_accuracy': array([0.72357724, 0.69918699, 0.69918699, 0.77235772, 0.75409836]),
  'test_precision': array([0.58695652, 0.58333333, 0.57894737, 0.72727273, 0.6875    ]),
  'test_recall': array([0.64285714, 0.48837209, 0.51162791, 0.55813953, 0.52380952]),
  'test_f1': array([0.61363636, 0.53164557, 0.54320988, 0.63157895, 0.59459459])},
 {'fit_time': array([2.26015663, 2.26015663, 2.26115775, 2.3036592 , 1.60637379]),
  'score_time': array([0.00900555, 0.00900555, 0.00900674, 0.00900674, 0.00500464]),
  'test_accuracy': array([0.71544715, 0.72357724, 0.71544715, 0.76422764, 0.75409836]),
  'test_precision': array([0.57777778, 0.62162162, 0.61111111, 0.71875   , 0.6875    ]),
  'test_recall': array([0.61904762, 0.53488372, 0.51162791, 0.53488372, 0.52380952]),
  'test_f1': array([0.59770115, 0.575     , 0.55696203, 0.61333333, 0.59459459])}]

Answer 1

簡而言之，您可以：

df = (pd.DataFrame(results)
      .assign(model=model_names)
      .apply(pd.Series.explode))
df

下面帶有 output 的完整可重現示例：

import pandas as pd
import numpy as np
results = [{'fit_time': np.array([0.03125048, 0.03125048, 0.03125048, 0.03125048, 0.        ]),
  'score_time': np.array([0.        , 0.01562595, 0.        , 0.        , 0.01562405]),
  'test_accuracy': np.array([0.71544715, 0.77235772, 0.73170732, 0.77235772, 0.75409836]),
  'test_precision': np.array([0.59459459, 0.74193548, 0.63888889, 0.75862069, 0.67647059]),
  'test_recall': np.array([0.52380952, 0.53488372, 0.53488372, 0.51162791, 0.54761905]),
  'test_f1': np.array([0.55696203, 0.62162162, 0.58227848, 0.61111111, 0.60526316])},
 {'fit_time': np.array([0.28125215, 0.29687738, 0.29687738, 0.29687738, 0.15625048]),
  'score_time': np.array([0.03125119, 0.03125095, 0.01562595, 0.03125095, 0.01564193]),
  'test_accuracy': np.array([0.72357724, 0.69918699, 0.69918699, 0.77235772, 0.75409836]),
  'test_precision': np.array([0.58695652, 0.58333333, 0.57894737, 0.72727273, 0.6875    ]),
  'test_recall': np.array([0.64285714, 0.48837209, 0.51162791, 0.55813953, 0.52380952]),
  'test_f1': np.array([0.61363636, 0.53164557, 0.54320988, 0.63157895, 0.59459459])},
 {'fit_time': np.array([2.26015663, 2.26015663, 2.26115775, 2.3036592 , 1.60637379]),
  'score_time': np.array([0.00900555, 0.00900555, 0.00900674, 0.00900674, 0.00500464]),
  'test_accuracy': np.array([0.71544715, 0.72357724, 0.71544715, 0.76422764, 0.75409836]),
  'test_precision': np.array([0.57777778, 0.62162162, 0.61111111, 0.71875   , 0.6875    ]),
  'test_recall': np.array([0.61904762, 0.53488372, 0.51162791, 0.53488372, 0.52380952]),
  'test_f1': np.array([0.59770115, 0.575     , 0.55696203, 0.61333333, 0.59459459])}]
model_names = ['LogisticRegressor', 'RandomForest', 'CatBoost']
df = (pd.DataFrame(results)
      .assign(model=model_names)
      .apply(pd.Series.explode))
df
Out[1]: 
   fit_time score_time test_accuracy test_precision test_recall   test_f1  \
0  0.031250   0.000000      0.715447       0.594595    0.523810  0.556962   
0  0.031250   0.015626      0.772358       0.741935    0.534884  0.621622   
0  0.031250   0.000000      0.731707       0.638889    0.534884  0.582278   
0  0.031250   0.000000      0.772358       0.758621    0.511628  0.611111   
0  0.000000   0.015624      0.754098       0.676471    0.547619  0.605263   
1  0.281252   0.031251      0.723577       0.586957    0.642857  0.613636   
1  0.296877   0.031251      0.699187       0.583333    0.488372  0.531646   
1  0.296877   0.015626      0.699187       0.578947    0.511628  0.543210   
1  0.296877   0.031251      0.772358       0.727273    0.558140  0.631579   
1  0.156250   0.015642      0.754098       0.687500    0.523810  0.594595   
2  2.260157   0.009006      0.715447       0.577778    0.619048  0.597701   
2  2.260157   0.009006      0.723577       0.621622    0.534884  0.575000   
2  2.261158   0.009007      0.715447       0.611111    0.511628  0.556962   
2  2.303659   0.009007      0.764228       0.718750    0.534884  0.613333   
2  1.606374   0.005005      0.754098       0.687500    0.523810  0.594595   

               model  
0  LogisticRegressor  
0  LogisticRegressor  
0  LogisticRegressor  
0  LogisticRegressor  
0  LogisticRegressor  
1       RandomForest  
1       RandomForest  
1       RandomForest  
1       RandomForest  
1       RandomForest  
2           CatBoost  
2           CatBoost  
2           CatBoost  
2           CatBoost  
2           CatBoost

將字典列表轉換為 DataFrame

問題描述

1 個解決方案

解決方案1
0 2021-04-01 11:37:15

將字典列表轉換為 DataFrame

問題描述

1 個解決方案

解決方案1 0 2021-04-01 11:37:15

解決方案1
0 2021-04-01 11:37:15