计算 R 中每列的 NA 总数

Question

I am currently trying to count the number of NAs found in each of my dataset's columns.我目前正在尝试计算在我的每个数据集的列中找到的 NA 数量。

I am running the following code:我正在运行以下代码：

  function(x, df1, df2, ncp, log = FALSE)

apply(Total_HousingData, 2, function(x) {sum(is.na(x))})

Here is my output:这是我的 output：

        Id    MSSubClass      MSZoning   LotFrontage       LotArea        Street 
            0             0             0             0             0             0 
        Alley      LotShape   LandContour     Utilities     LotConfig     LandSlope 
            0             0             0             0             0             0 
 Neighborhood    Condition1    Condition2      BldgType    HouseStyle   OverallQual 
            0             0             0             0             0             0 
  OverallCond     YearBuilt  YearRemodAdd     RoofStyle      RoofMatl   Exterior1st 
            0             0             0             0             0             0 
  Exterior2nd    MasVnrType    MasVnrArea     ExterQual     ExterCond    Foundation 
            0             0             0             0             0             0 
     BsmtQual      BsmtCond  BsmtExposure  BsmtFinType1    BsmtFinSF1  BsmtFinType2 
            0             0             0             0             1             0 
   BsmtFinSF2     BsmtUnfSF   TotalBsmtSF       Heating     HeatingQC    CentralAir 
            1             1             1             0             0             0 
   Electrical      1stFlrSF      2ndFlrSF  LowQualFinSF     GrLivArea  BsmtFullBath 
            0             0             0             0             0             2 
 BsmtHalfBath      FullBath      HalfBath  BedroomAbvGr  KitchenAbvGr   KitchenQual 
            2             0             0             0             0             0 
 TotRmsAbvGrd    Functional    Fireplaces   FireplaceQu    GarageType   GarageYrBlt 
            0             0             0             0             0             0 
 GarageFinish    GarageCars    GarageArea    GarageQual    GarageCond    PavedDrive 
            0             1             1             0             0             0 
   WoodDeckSF   OpenPorchSF EnclosedPorch     3SsnPorch   ScreenPorch      PoolArea 
            0             0             0             0             0             0 
       PoolQC         Fence   MiscFeature       MiscVal        MoSold        YrSold 
            0             0             0             0             0             0 
     SaleType SaleCondition     SalePrice 
            0             0          1459

For some reason, all of the NA counts are being counted on the SalePrice variable.出于某种原因，所有 NA 计数都计入 SalePrice 变量。 When I look at other variables, there are plenty of NAs.当我查看其他变量时，有很多 NA。 I tried factoring the appropriate variables, but this still hasn't fixed the issue.我尝试考虑适当的变量，但这仍然没有解决问题。

"Alley" for instance should read 1, but its NA is not being picked up.例如，“Alley”应该读为 1，但它的 NA 没有被拾取。

Here is a sample of the code:这是代码示例：

 Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities
  <dbl>      <dbl> <chr>    <chr>         <dbl> <chr>  <chr> <chr>    <chr>       <chr>    
1     1         60 RL       65             8450 Pave   NA    Reg      Lvl         AllPub   
2     2         20 RL       80             9600 Pave   NA    Reg      Lvl         AllPub   
3     3         60 RL       68            11250 Pave   NA    IR1      Lvl         AllPub   
4     4         70 RL       60             9550 Pave   NA    IR1      Lvl         AllPub   
5     5         60 RL       84            14260 Pave   NA    IR1      Lvl         AllPub   
6     6         50 RL       85            14115 Pave   NA    IR1      Lvl         AllPub

Answer 1

Try using sapply , this is the one-liner I use, with df as your dataframe.尝试使用sapply ，这是我使用的单线，使用df作为您的 dataframe。

sapply(df, function(x) sum(is.na(x)))

Answer 2

Another solution with colSums() . colSums()的另一个解决方案。 is.na(df) gives you a data frame and all it's columns are logicals being TRUE for each cell being NA . is.na(df) TRUE你一个数据框，它的所有列都是逻辑，每个单元格都是NA 。 colSums() sums up the TRUE values. colSums()对TRUE值求和。

Total_HousingData <- data.frame(A = c(1, 2, NA, NA, NA), B = c(1, NA, 3, 4, 5), C = c(NA, 2, 3, NA, 5))

colSums(is.na(Total_HousingData))
#> A B C 
#> 3 1 2

^{Created on 2021-02-20 by the reprex package (v1.0.0)}^{由代表 package (v1.0.0) 于 2021 年 2 月 20 日创建}

计算 R 中每列的 NA 总数

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-02-19 21:59:02

解决方案2
0 2021-02-20 22:37:57

计算 R 中每列的 NA 总数

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-02-19 21:59:02

解决方案2 0 2021-02-20 22:37:57

解决方案1
1 已采纳 2021-02-19 21:59:02

解决方案2
0 2021-02-20 22:37:57