找到具有 0 個樣本 (shape=(0, 1)) 的數組，而至少需要 1 個

Question

當我嘗試查找 MI 值時，此錯誤不斷出現。 我的代碼如下

X_new = X.copy()
X_new = X_new.fillna(0)
y = data.SalePrice


def make_mi_scores(X, y):
    X = X.copy()
    for colname in X.select_dtypes(["object", "category"]):
        X[colname], _ = X[colname].factorize()       
    discrete_features = [pd.api.types.is_integer_dtype(t) for t in X.dtypes]   
    mi_scores = mutual_info_regression(X, y, discrete_features=discrete_features, random_state=0)
    mi_scores = pd.Series(mi_scores, name="MI Scores", index=X.columns)
    mi_scores = mi_scores.sort_values(ascending=False)
    return mi_scores

def plot_mi_scores(scores):
    scores = scores.sort_values(ascending=True)
    width = np.arange(len(scores))
    ticks = list(scores.index)
    plt.barh(width, scores)
    plt.yticks(width, ticks)
    plt.title("Mutual Information Scores")
    
plt.figure(dpi=100, figsize=(8, 5))
plot_mi_scores(make_mi_scores(X_new,y))

如果你想要完整的筆記本，這里有一個鏈接https://www.kaggle.com/code/snigdhkarki/house-price-competition

錯誤如下

    ValueError                                Traceback (most recent call last)
/tmp/ipykernel_19/1575243112.py in <module>
     42 
     43 plt.figure(dpi=100, figsize=(8, 5))
---> 44 plot_mi_scores(make_mi_scores(X_new,y))

/tmp/ipykernel_19/1575243112.py in make_mi_scores(X, y)
     28     print(X.isnull().any().any())
     29     print(y.isnull().any().any())
---> 30     mi_scores = mutual_info_regression(X, y, discrete_features=discrete_features, random_state=0)
     31     mi_scores = pd.Series(mi_scores, name="MI Scores", index=X.columns)
     32     mi_scores = mi_scores.sort_values(ascending=False)

/opt/conda/lib/python3.7/site-packages/sklearn/feature_selection/_mutual_info.py in mutual_info_regression(X, y, discrete_features, n_neighbors, copy, random_state)
    382            of a Random Vector", Probl. Peredachi Inf., 23:2 (1987), 9-16
    383     """
--> 384     return _estimate_mi(X, y, discrete_features, False, n_neighbors, copy, random_state)
    385 
    386 

/opt/conda/lib/python3.7/site-packages/sklearn/feature_selection/_mutual_info.py in _estimate_mi(X, y, discrete_features, discrete_target, n_neighbors, copy, random_state)
    300     mi = [
    301         _compute_mi(x, y, discrete_feature, discrete_target, n_neighbors)
--> 302         for x, discrete_feature in zip(_iterate_columns(X), discrete_mask)
    303     ]
    304 

/opt/conda/lib/python3.7/site-packages/sklearn/feature_selection/_mutual_info.py in <listcomp>(.0)
    300     mi = [
    301         _compute_mi(x, y, discrete_feature, discrete_target, n_neighbors)
--> 302         for x, discrete_feature in zip(_iterate_columns(X), discrete_mask)
    303     ]
    304 

/opt/conda/lib/python3.7/site-packages/sklearn/feature_selection/_mutual_info.py in _compute_mi(x, y, x_discrete, y_discrete, n_neighbors)
    160         return mutual_info_score(x, y)
    161     elif x_discrete and not y_discrete:
--> 162         return _compute_mi_cd(y, x, n_neighbors)
    163     elif not x_discrete and y_discrete:
    164         return _compute_mi_cd(x, y, n_neighbors)

/opt/conda/lib/python3.7/site-packages/sklearn/feature_selection/_mutual_info.py in _compute_mi_cd(c, d, n_neighbors)
    137     radius = radius[mask]
    138 
--> 139     kd = KDTree(c)
    140     m_all = kd.query_radius(c, radius, count_only=True, return_distance=False)
    141     m_all = np.array(m_all) - 1.0

sklearn/neighbors/_binary_tree.pxi in sklearn.neighbors._kd_tree.BinaryTree.__init__()

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    806                 "Found array with %d sample(s) (shape=%s) while a"
    807                 " minimum of %d is required%s."
--> 808                 % (n_samples, array.shape, ensure_min_samples, context)
    809             )
    810 

ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required.

只有少數幾個地方提出了這個問題，即使在那些地方我也找不到我的問題的任何答案

Answer 1

問題出現在您在這里調用 mutual_info_regression 的地方 -

mi_scores = mutual_info_regression(X, y, discrete_features=discrete_features, random_state=0)

根據 sklearn 的文檔，參數 discrete_features 應該是一個布爾掩碼，對於離散變量具有 True，否則為 False。

我檢查了您的 Kaggle 代碼，您在數據框中識別離散和連續特征的技術似乎是錯誤的。

使代碼運行的一個簡單技巧是使用以下代碼將所有功能識別為連續的 -

discrete_features = [False]*73
# 73 is the number of columns X has

但是，如果 mutual_info_regression 算法要求您准確識別離散和連續特征，則結果可能是錯誤的。

找到具有 0 個樣本 (shape=(0, 1)) 的數組，而至少需要 1 個

問題描述

1 個解決方案

解決方案1
0 已采納 2022-12-24 16:32:38

找到具有 0 個樣本 (shape=(0, 1)) 的數組，而至少需要 1 個

問題描述

1 個解決方案

解決方案1 0 已采納 2022-12-24 16:32:38

解決方案1
0 已采納 2022-12-24 16:32:38