[英]Nulls in Data frame . How to remove if it is better in Logistic regresion models
I have a Data Frame with two columns that have populations of NULL in them 我有一个包含两列的数据框,其中填充了NULL
'data.frame': 31337 obs. of 16 variables:
# $ ID : int 1 2 3 5 6 7 8 9 10 11 ...
# $ Target : int 0 0 0 0 0 0 0 0 0 0 ...
# $ band : chr "3. 35 to 44" "NULL" "NULL" "NULL" ...
# $ gender : chr "Male" "NULL" "Male" "NULL" ...
a) Do I remove the Rows with "Null" in R or b) do I leave the Null as a seperate category for Logistic Regression in R ? a)是否删除R中带有“ Null”的行或b)是否将Null保留为R中Logistic回归的单独类别?
If the answer to a is yes then how do I do it 如果答案为是,那我该怎么办
There are several things going on here with your question. 您的问题正在发生几件事。
NULL
. NULL
。 Eg, 例如,
is.null(NULL)
[1] TRUE
is.null("NULL")
[1] FALSE
NULL
and NA
. NULL
和NA
之间存在差异。 NULL
represents a null or empty object. NULL
表示一个空对象或空对象。 It is often returned by functions so that values are undefined. NA
is a missing value (does not exist). NA
是一个缺失值(不存在)。 Based on your context, I would replace your "NULL" values with NA
. NA
替换您的“ NULL”值。 For a quick way to replace "NULL" with NA
, see dplyr::na_if()
. NA
替换“ NULL”的快速方法,请参见dplyr::na_if()
。 ( Link to function's documentation.) glm()
to carry out your logistic regression model there are several ways glm()
handles missing data (NAs). glm()
来执行逻辑回归模型,则glm()
有几种处理缺失数据(NA)的方法。 You can control how it handles NAs with the argument na.action
. na.action
来控制它如何处理NA。 Run ?glm
in the console to pull up the help page for this function. ?glm
,以拉出此功能的帮助页面。 There is a description of each of the argument's values. To answer your question about removing NAs or using a dummy indicator for missing values, that's a matter of model intent. 要回答有关删除NA或对缺失值使用虚拟指示器的问题,这是模型意图的问题。 It is difficult to provide a general answer to such a broad topic without more details.
如果没有更多细节,很难为这样一个广泛的话题提供一般性的答案。
@jordan .. Fantastic advice .. dataframe shrunk to 14% of size @jordan ..很棒的建议..数据框缩小到大小的14%
data=na_if(data,"NULL") data <- data[!is.na(data$age_band) & !is.na(data$gender), ] data = na_if(data,“ NULL”)data <-data [!is.na(data $ age_band)&!is.na(data $ gender),]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.