separate a dataset into a regression group and a control group in R

Question

This is more likely a design question. If I plan to run a regression Y = X1 + X2 + X3 + X4 + X5, and I have X1 through X10 as well as Y in my data. What is the best way to separate the dataset into a regression sample and a control group, so I can run regression on the regression sample and validate my model using the control group? Should I just create a column with random numbers and separate them that way? Thanks.

Answer 1

If you have a data frame called df with a bunch of rows and the above columns, you can sample n number of rows (67% in this example) as follows and create sample group and control group:

x <- sample(nrow(df), 0.67*nrow(df))
sampledf <- df[x, ]
controldf <- df[-x, ]

If you want to then rearrange row numbers, you can assign new sequential row numbers like this:

row.names(sampledf) <- seq(1:nrow(sampled))
row.names(controldf) <- seq(1:nrow(controldf))

separate a dataset into a regression group and a control group in R

Question

1 answers

solution1
1 ACCPTED 2016-01-12 15:38:26

separate a dataset into a regression group and a control group in R

Question

1 answers

solution1 1 ACCPTED 2016-01-12 15:38:26

solution1
1 ACCPTED 2016-01-12 15:38:26