简体   繁体   English

计算 R 中每列的 NA 总数

[英]Count Total Number of NAs per Column in R

I am currently trying to count the number of NAs found in each of my dataset's columns.我目前正在尝试计算在我的每个数据集的列中找到的 NA 数量。

I am running the following code:我正在运行以下代码:

  function(x, df1, df2, ncp, log = FALSE)

apply(Total_HousingData, 2, function(x) {sum(is.na(x))})

Here is my output:这是我的 output:

        Id    MSSubClass      MSZoning   LotFrontage       LotArea        Street 
            0             0             0             0             0             0 
        Alley      LotShape   LandContour     Utilities     LotConfig     LandSlope 
            0             0             0             0             0             0 
 Neighborhood    Condition1    Condition2      BldgType    HouseStyle   OverallQual 
            0             0             0             0             0             0 
  OverallCond     YearBuilt  YearRemodAdd     RoofStyle      RoofMatl   Exterior1st 
            0             0             0             0             0             0 
  Exterior2nd    MasVnrType    MasVnrArea     ExterQual     ExterCond    Foundation 
            0             0             0             0             0             0 
     BsmtQual      BsmtCond  BsmtExposure  BsmtFinType1    BsmtFinSF1  BsmtFinType2 
            0             0             0             0             1             0 
   BsmtFinSF2     BsmtUnfSF   TotalBsmtSF       Heating     HeatingQC    CentralAir 
            1             1             1             0             0             0 
   Electrical      1stFlrSF      2ndFlrSF  LowQualFinSF     GrLivArea  BsmtFullBath 
            0             0             0             0             0             2 
 BsmtHalfBath      FullBath      HalfBath  BedroomAbvGr  KitchenAbvGr   KitchenQual 
            2             0             0             0             0             0 
 TotRmsAbvGrd    Functional    Fireplaces   FireplaceQu    GarageType   GarageYrBlt 
            0             0             0             0             0             0 
 GarageFinish    GarageCars    GarageArea    GarageQual    GarageCond    PavedDrive 
            0             1             1             0             0             0 
   WoodDeckSF   OpenPorchSF EnclosedPorch     3SsnPorch   ScreenPorch      PoolArea 
            0             0             0             0             0             0 
       PoolQC         Fence   MiscFeature       MiscVal        MoSold        YrSold 
            0             0             0             0             0             0 
     SaleType SaleCondition     SalePrice 
            0             0          1459

For some reason, all of the NA counts are being counted on the SalePrice variable.出于某种原因,所有 NA 计数都计入 SalePrice 变量。 When I look at other variables, there are plenty of NAs.当我查看其他变量时,有很多 NA。 I tried factoring the appropriate variables, but this still hasn't fixed the issue.我尝试考虑适当的变量,但这仍然没有解决问题。

"Alley" for instance should read 1, but its NA is not being picked up.例如,“Alley”应该读为 1,但它的 NA 没有被拾取。

Here is a sample of the code:这是代码示例:

 Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities
  <dbl>      <dbl> <chr>    <chr>         <dbl> <chr>  <chr> <chr>    <chr>       <chr>    
1     1         60 RL       65             8450 Pave   NA    Reg      Lvl         AllPub   
2     2         20 RL       80             9600 Pave   NA    Reg      Lvl         AllPub   
3     3         60 RL       68            11250 Pave   NA    IR1      Lvl         AllPub   
4     4         70 RL       60             9550 Pave   NA    IR1      Lvl         AllPub   
5     5         60 RL       84            14260 Pave   NA    IR1      Lvl         AllPub   
6     6         50 RL       85            14115 Pave   NA    IR1      Lvl         AllPub   

Try using sapply , this is the one-liner I use, with df as your dataframe.尝试使用sapply ,这是我使用的单线,使用df作为您的 dataframe。

sapply(df, function(x) sum(is.na(x)))

Another solution with colSums() . colSums()的另一个解决方案。 is.na(df) gives you a data frame and all it's columns are logicals being TRUE for each cell being NA . is.na(df) TRUE你一个数据框,它的所有列都是逻辑,每个单元格都是NA colSums() sums up the TRUE values. colSums()TRUE值求和。

Total_HousingData <- data.frame(A = c(1, 2, NA, NA, NA), B = c(1, NA, 3, 4, 5), C = c(NA, 2, 3, NA, 5))

colSums(is.na(Total_HousingData))
#> A B C 
#> 3 1 2

Created on 2021-02-20 by the reprex package (v1.0.0)代表 package (v1.0.0) 于 2021 年 2 月 20 日创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM