[英]Count Total Number of NAs per Column in R
I am currently trying to count the number of NAs found in each of my dataset's columns.我目前正在尝试计算在我的每个数据集的列中找到的 NA 数量。
I am running the following code:我正在运行以下代码:
function(x, df1, df2, ncp, log = FALSE)
apply(Total_HousingData, 2, function(x) {sum(is.na(x))})
Here is my output:这是我的 output:
Id MSSubClass MSZoning LotFrontage LotArea Street
0 0 0 0 0 0
Alley LotShape LandContour Utilities LotConfig LandSlope
0 0 0 0 0 0
Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual
0 0 0 0 0 0
OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st
0 0 0 0 0 0
Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation
0 0 0 0 0 0
BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2
0 0 0 0 1 0
BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir
1 1 1 0 0 0
Electrical 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath
0 0 0 0 0 2
BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual
2 0 0 0 0 0
TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt
0 0 0 0 0 0
GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive
0 1 1 0 0 0
WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea
0 0 0 0 0 0
PoolQC Fence MiscFeature MiscVal MoSold YrSold
0 0 0 0 0 0
SaleType SaleCondition SalePrice
0 0 1459
For some reason, all of the NA counts are being counted on the SalePrice variable.出于某种原因,所有 NA 计数都计入 SalePrice 变量。 When I look at other variables, there are plenty of NAs.
当我查看其他变量时,有很多 NA。 I tried factoring the appropriate variables, but this still hasn't fixed the issue.
我尝试考虑适当的变量,但这仍然没有解决问题。
"Alley" for instance should read 1, but its NA is not being picked up.例如,“Alley”应该读为 1,但它的 NA 没有被拾取。
Here is a sample of the code:这是代码示例:
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities
<dbl> <dbl> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
1 1 60 RL 65 8450 Pave NA Reg Lvl AllPub
2 2 20 RL 80 9600 Pave NA Reg Lvl AllPub
3 3 60 RL 68 11250 Pave NA IR1 Lvl AllPub
4 4 70 RL 60 9550 Pave NA IR1 Lvl AllPub
5 5 60 RL 84 14260 Pave NA IR1 Lvl AllPub
6 6 50 RL 85 14115 Pave NA IR1 Lvl AllPub
Try using sapply
, this is the one-liner I use, with df
as your dataframe.尝试使用
sapply
,这是我使用的单线,使用df
作为您的 dataframe。
sapply(df, function(x) sum(is.na(x)))
Another solution with colSums()
. colSums()
的另一个解决方案。 is.na(df)
gives you a data frame and all it's columns are logicals being TRUE
for each cell being NA
. is.na(df)
TRUE
你一个数据框,它的所有列都是逻辑,每个单元格都是NA
。 colSums()
sums up the TRUE
values. colSums()
对TRUE
值求和。
Total_HousingData <- data.frame(A = c(1, 2, NA, NA, NA), B = c(1, NA, 3, 4, 5), C = c(NA, 2, 3, NA, 5))
colSums(is.na(Total_HousingData))
#> A B C
#> 3 1 2
Created on 2021-02-20 by the reprex package (v1.0.0)由代表 package (v1.0.0) 于 2021 年 2 月 20 日创建
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.