简体   繁体   中英

separate a dataset into a regression group and a control group in R

This is more likely a design question. If I plan to run a regression Y = X1 + X2 + X3 + X4 + X5, and I have X1 through X10 as well as Y in my data. What is the best way to separate the dataset into a regression sample and a control group, so I can run regression on the regression sample and validate my model using the control group? Should I just create a column with random numbers and separate them that way? Thanks.

If you have a data frame called df with a bunch of rows and the above columns, you can sample n number of rows (67% in this example) as follows and create sample group and control group:

x <- sample(nrow(df), 0.67*nrow(df))
sampledf <- df[x, ]
controldf <- df[-x, ]

If you want to then rearrange row numbers, you can assign new sequential row numbers like this:

row.names(sampledf) <- seq(1:nrow(sampled))
row.names(controldf) <- seq(1:nrow(controldf))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM