在 R 中拆分为训练和测试集？

Question

How can I write the following written code in python into R ?如何将以下用 python 编写的代码写入 R ？

X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.2, random_state=42)

Spliting into training and testing set 80/20 ratio.以 80/20 的比例拆分为训练集和测试集。

Answer 1

Probably the simpler way to do so可能是更简单的方法

#read in iris dataset 
 data(iris)  
 library(caret) #this package has the createDataPartition function
    
 set.seed(123) #randomization`
    
 #creating indices
 trainIndex <- createDataPartition(iris$Species,p=0.75,list=FALSE)
    
 #splitting data into training/testing data using the trainIndex object
 IRIS_TRAIN <- iris[trainIndex,] #training data (75% of data)
    
 IRIS_TEST <- iris[-trainIndex,] #testing data (25% of data)

Answer 2

You can do this using caret 's createDataPartition function:您可以使用caret的createDataPartition函数执行此操作：

library(caret)

# Make example data
X = data.frame(matrix(rnorm(200), nrow = 100)) 
y = rnorm(100) 

#Extract random sample of indices for test data
set.seed(42) #equivalent to python's random_state arg
test_inds = createDataPartition(y = 1:length(y), p = 0.2, list = F) 

# Split data into test/train using indices
X_test = X[test_inds, ]; y_test = y[test_inds] 
X_train = X[-test_inds, ]; y_train = y[-test_inds]

You could also create test_inds 'from scratch' using test_inds = sample(1:length(y), ceiling(length(y) * 0.2))您还可以使用test_inds = sample(1:length(y), ceiling(length(y) * 0.2))从头开始创建test_inds

Answer 3

Using base R you can do the following:使用基础 R，您可以执行以下操作：

set.seed(12345)
#getting training data set sizes of .20 (in this case 20 out of 100)
train.x<-sample(1:100, 20)
train.y<-sample(1:100, 20)

#simulating random data
x<-rnorm(100)
y<-rnorm(100)

#sub-setting the x data
training.x.data<-x[train]
testing.x.data<-x[-train]

#sub-setting the y data
training.y.data<-y[train]
testing.y.data<-y[-train]

Answer 4

Let's take the iris dataset:让我们以iris数据集为例：

# in case you want to use a seed
set.seed(5)
## 70% of the sample size
train_size <- floor(0.75 * nrow(iris))

in_rows <- sample(c(1:nrow(iris)), size = train_size, replace = FALSE)

train <- iris[in_rows, ]
test <- iris[-in_rows, ]

在 R 中拆分为训练和测试集？

问题描述

4 个解决方案

解决方案1
5 2017-11-09 20:16:26

解决方案2
1 2017-11-09 20:05:28

解决方案3
1 已采纳 2017-11-09 20:20:24

解决方案4
0 2020-08-19 22:55:06

在 R 中拆分为训练和测试集？

问题描述

4 个解决方案

解决方案1 5 2017-11-09 20:16:26

解决方案2 1 2017-11-09 20:05:28

解决方案3 1 已采纳 2017-11-09 20:20:24

解决方案4 0 2020-08-19 22:55:06

解决方案1
5 2017-11-09 20:16:26

解决方案2
1 2017-11-09 20:05:28

解决方案3
1 已采纳 2017-11-09 20:20:24

解决方案4
0 2020-08-19 22:55:06