簡體   English   中英

如何將矩陣拆分為訓練測試數據,同時確保訓練矩陣的行和列中至少存在一個值?

[英]how can I split matrix into training testing data whilst ensuring there is at least one value present in the rows and columns of the training matrix?

我想將稀疏矩陣隨機拆分為相同維度的訓練和測試數據,同時確保訓練集中沒有充滿零的列或行。

為了讓我的算法工作,我需要在訓練集的每一行和每一列中至少有一個值。

我嘗試使用這個庫函數: from sklearn.model_selection import train_test_split

例如給定矩陣:

[[0, 1, 3, 1],
[0, 0, 0, 1],
[8, 0, 0, 1]]

可以拆分矩陣以生成此訓練矩陣:

[[0, 1, 0, 1],
[0, 0, 0, 0],
[0, 0, 0, 8]]

其中第二行僅包含 0。 我怎樣才能避免這種情況?

from sklearn.model_selection import KFold 
import numpy as np 

# Create some dummy data
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [0, 0])

# Remove rows having all of their columns equal to 0
X = X[~np.all(X == 0, axis=1)]

# Assuming 2-fold cross-validation
kf = KFold(n_splits=2)
kf.get_n_splits(X)

現在kf有兩個訓練/測試折疊:

for training, testing in kf.split(X):
    X_train, X_test = X[training], X[testing]

    # Do whatever you want with your model ...

    print(“Training:”, training, “Testing:”, testing)


>>> ('Training:', array([2, 3]), 'Testing:', array([0, 1]))
>>> ('Training:', array([0, 1]), 'Testing:', array([2, 3]))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM