[英]selecting rows from 2D array not in another 2D array
i want to choose randomly 25% rows of a 2D array the return the rows i choosed and not choosed also.我想随机选择 25% 的二维数组行,然后返回我选择的行和未选择的行。
Example:例子:
cols = [['republican' 'n' 'y' 'n' 'n' 'y']
['republican' 'n' 'y' 'y' 'n' 'n']
['democrat' 'y' 'y' 'y' 'n' 'n']
['republican' 'n' 'y' 'y' 'n' 'y']]
new = [['democrat' 'y' 'y' 'y' 'n' 'n']]
rem = [['republican' 'n' 'y' 'n' 'n' 'y']
['republican' 'n' 'y' 'y' 'n' 'n']
['republican' 'n' 'y' 'y' 'n' 'y']]
here is my code:这是我的代码:
def splitRandom(cols, percent):
rng = np.random.default_rng()
new = cols[np.random.choice(cols.shape[0], int(cols.shape[0]*(percent/100)), replace=False)]
rem = #help here
return new,rem
i tried to use setdiff1d but didn't work and i don't know how to mask it.我尝试使用 setdiff1d 但没有用,我不知道如何屏蔽它。
I modified your code slightly:我稍微修改了你的代码:
def splitRandom(cols, percent):
rng = np.random.default_rng()
choise = np.random.choice(cols.shape[0], int(cols.shape[0] * (percent / 100)),replace=False)
new = cols[choise]
rem = np.delete(cols, choise, axis=0)
return new,rem
edit: I assume cols is a numpy array编辑:我假设 cols 是一个 numpy 数组
Here is an alternate approach that avoids modification of the input array:这是避免修改输入数组的替代方法:
def splitRandom(cols, percent):
rng = np.random.default_rng()
choise = np.random.choice(cols.shape[0], int(cols.shape[0] * (percent / 100)),replace=False)
mask = np.zeros(cols.shape[0], dtype=bool)
mask[choise] = True
return cols[mask], cols[~mask]
Note that this sort of operation is common enough that machine learning frameworks often have built-in methods to do it efficiently;请注意,这种操作很常见,以至于机器学习框架通常具有内置方法来有效地执行此操作; for example, here is the equivalent using sklearn.model_selection.train_test_split
:例如,这是使用sklearn.model_selection.train_test_split
的等价物:
from sklearn.model_selection import train_test_split
train_test_split(cols, train_size=0.25)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.