简体   繁体   中英

Multiple regressions with subsets of data using dplyr in R

I have a data frame "DF" with this glimpse() :

Observations: 1244160
Variables:
$ Test      (fctr) 72001.txt, 72002.txt, 72003.txt, 72004.txt, 72005.txt,...
$ x         (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
$ y         (int) 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2...
$ Value     (dbl) -77.111111, -13.111111, 13.888889, 235.888889, 138.8888...

For each Test, I want to model "Value" using a subset of the data:

  1. The function: Value ~ x + y
  2. The data: (x, y) / 0 < x < 6, 0 < y < 6

Then, I want to predict the "Value" for all the data in "DF" using these models.

For these calculations, I want to use dplyr . However, I don't find the way to do it. This was my last try:

DF %>% 
    group_by(Test) %>% 
    do({
        mod = lm(Value ~ x + y, data = (. %>% filter((x > 0) &  (x < 6) & (y > 0) & (y < 6))))
        print(mod)
        Pred <- predict(mod, .)
        data.frame(. , Pred)
    })
glimpse()

But it's failing. Can you help me?

Reproducible example

To test answers, we can use a dummy reproducible data frame, eg, mtcars:

mtcars %>% 
    group_by(cyl) %>% 
    do({ 
        mod = lm(mpg ~ wt + qsec, data = . %>% filter(vs == 0))
        print(mod)
        Pred <- predict(mod)
        data.frame(. , Pred)
    })
glimpse()

Use subset argument of lm function.

results <- DF %>% 
           group_by(Test) %>% 
           do(mod = lm(Value ~ x + y, data = ., subset = foo))

To generate predicted values try this:

predict <- results %>% 
           do(data.frame(pred = predict(.$mod), Test = .[["Test"]]))

Keep the filter before group_by :

mtcars %>% 
  filter(vs==0) %>%
  group_by(cyl) %>% 
  do({ 
    mod = lm(mpg ~ wt + qsec, data = .)
    Pred <- predict(mod)
    data.frame(Pred)
  })

Non dplyr solution:

lapply(split(mtcars,mtcars$cyl), function(i){
  mod <- lm(mpg ~ wt + qsec, i[i$vs == 0,])
  Pred <- predict(mod)
  data.frame(Pred)
  })

I think that I have an answer, that was close to my try:

results <- mtcars %>% 
    group_by(cyl) %>% 
    do({ 
        mod = lm(mpg ~ wt + qsec, data = filter(., vs == 0))
        print(mod)
        Pred <- predict(mod, .)
        data.frame(. , Pred)
    })

print(results, n=100)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM