简体   繁体   中英

Converting frequency data for use in logistic regression in R

Simple question here: I have the following data and I need to get it in a format where I can run a logistic regression on it.

pvp <- rep(c("lib", "mod", "con"), 3)
pres <- c(rep("Bush", 3), rep("Clinton", 3), rep("Perot", 3))
count <- c(70, 195, 382, 324, 332, 199, 56, 101, 117)
df <- as.data.frame(cbind(pvp, pres, count))

df$pres <- recode(df$pres, 'Clinton' = '1', 'Bush' = '0', 'Perot' = '0')
df$count <- as.numeric(as.character(df$count))

It looks like this:

> df
  pvp pres count
1 lib    0    70
2 mod    0   195
3 con    0   382
4 lib    1   324
5 mod    1   332
6 con    1   199
7 lib    0    56
8 mod    0   101
9 con    0   117

I need to run a logistic regression predicting pres from pvp. Normally I think I would just use spread from tidyverse to get the data into a wide format. But here I have an issue with using key = pvp in that spread function. I can't collapse the categories either because some of them obviously correspond with pres = 1 and some with pres = 0. What solution can I use to get the data in a format where I can run a logistic regression on it?

Thanks in advance.

There is no need to expand the data, you can use the "weight" parameter while training the model.

model_logit <- glm(pres ~ pvp, family="binomial", weight = df$count, data = df)
predictions <- predict(model_logit, data.frame(pvp=unique(df$pvp)), type="response") 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM