简体   繁体   中英

data_GAN Logistic Regression in R

I've been reading about logistical regression in R. It makes sense when there are columns/variables that actually mean something. My columns are A, B, and C. Column C has only 1's and 0's. How am I to do a regression with such a limited dataset? Any guidance or resources to read would be appreciated.

> library(Amelia)
> library(mlbench)
> library(dplyr)
> my_data<-read.csv("/Users/morenikeirving/GAN/data_GAN.csv")
> names(my_data)
[1] "A" "B" "C"
> head(my_data)
        A      B  C
1  4.4189 69.580 NA
2 13.2019 61.250 NA
3 25.6290 56.740  1
4 22.2943 68.860  1
5  0.2163 57.690 NA
6  0.2875 72.914 NA
> summary(my_data)
       A                B               C       
 Min.   : 0.000   Min.   :33.00   Min.   :1     
 1st Qu.: 1.226   1st Qu.:59.69   1st Qu.:1     
 Median : 5.897   Median :61.87   Median :1     
 Mean   : 7.450   Mean   :65.40   Mean   :1     
 3rd Qu.:12.600   3rd Qu.:69.58   3rd Qu.:1     
 Max.   :25.800   Max.   :95.00   Max.   :1     
                                  NA's   :2923  
> missmap(my_data, col=c("blue", "red"), legend=FALSE)
> my_data<-my_data %>% mutate(C = ifelse(is.na(C),0,C))
> missmap(my_data, col=c("blue", "red"), legend=FALSE)
> model <-glm(x~., data=my_data, family= binomial)
Error in eval(predvars, data, env) : object 'x' not found
> #Library to read in xls file 
> library(Amelia)
> library(mlbench)
> library(dplyr)
> 
> #Read in csv file 
> my_data<-read.csv("/Users/GAN/data_GAN.csv")
> 
> #Exploring Data 
> #see what's on the data frame 
> names(my_data)
[1] "A" "B" "C"
> 
> #Look at first few rows of the data 
> head(my_data)
        A      B  C
1  4.4189 69.580 NA
2 13.2019 61.250 NA
3 25.6290 56.740  1
4 22.2943 68.860  1
5  0.2163 57.690 NA
6  0.2875 72.914 NA
> 
> #Overall picture of data; looking at first few rows revealed missing data
> summary(my_data)
       A                B               C       
 Min.   : 0.000   Min.   :33.00   Min.   :1     
 1st Qu.: 1.226   1st Qu.:59.69   1st Qu.:1     
 Median : 5.897   Median :61.87   Median :1     
 Mean   : 7.450   Mean   :65.40   Mean   :1     
 3rd Qu.:12.600   3rd Qu.:69.58   3rd Qu.:1     
 Max.   :25.800   Max.   :95.00   Max.   :1     
                                  NA's   :2923  
> #lots of NAs
> 
> #Examine missing data 
> 
> missmap(my_data, col=c("blue", "red"), legend=FALSE)
> 
> #Replace N/A 
> 
> my_data<-my_data %>% mutate(C = ifelse(is.na(C),0,C))
> 
> #Check to make sure missing values are resolved
> missmap(my_data, col=c("blue", "red"), legend=FALSE)

(1) Are you asking how to write logistical regression code? Or (2) are you asking how to improve the quality of your dataset?

(1) https://stats.idre.ucla.edu/r/dae/logit-regression/

model <- glm(C ~ A + B)

(2) If you have a small dataset with low quality data, there's not much you can do other than getting a new dataset or gathering more data.

You can consider resampling, but that is not always applicable and has its own set of problems when using

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM