简体   繁体   English

如何在 R Dplyr 中考虑 NA

[英]How to account for NA's in R Dplyr

Below is the list of packages, sample data, and the script that I am running.下面是我正在运行的包、示例数据和脚本的列表。 Below that is the schema.下面是架构。 You will notice that two of the values are above 500 and therefore do not fit the schema.您会注意到其中两个值高于 500,因此不适合架构。 The desired result would only take into account those that fit the schema (employing less than 500).期望的结果将只考虑那些符合模式的(雇用少于 500 人)。 When I run this on my larger data set(not the sample data set below), I get the a result that is like what is found at the bottom.当我在更大的数据集(而不是下面的示例数据集)上运行它时,我得到的结果类似于底部的结果。 In short, how would I modify the script so that it leaves out the entries that are greater than 500 and therefore does not return a fifth row of NA?简而言之,我将如何修改脚本以便它忽略大于 500 的条目,因此不返回第五行 NA?

library(dplyr)
library(data.table)
library(odbc)
library(DBI)
library(stringr)

firm <- c("firm1","firm2","firm3","firm4","firm5","firm6","firm7","firm8","firm9","firm10","firm11")
employment <- c(1,50,90,249,499,115,145,261,210,874,1140)
small <- c(1,1,1,3,4,2,2,4,3,NA,NA)

smbtest <- data.frame(firm,employment,small)

smbsummary2<-smbtest %>% 
select(employment,small) %>%
group_by(small) %>%
summarise(employment = sum(employment), worksites = n(), 
        .groups = 'drop') %>% 
mutate(employment = cumsum(employment),
     worksites = cumsum(worksites))

smb1     >= 0 and <100
smb2     >= 0 and <150
smb3     >= 0 and <250
smb4     >= 0 and <500

smb      employment   worksites
 1           1000         20
 2           1500         22
 3           2500         25
 4           10000        29
 5           25000        NA

here I believe this would help在这里,我相信这会有所帮助

firm <- c("firm1","firm2","firm3","firm4","firm5","firm6","firm7","firm8","firm9","firm10","firm11")
employment <- c(1,50,90,249,499,115,145,261,210,874,1140)
small <- c(1,1,1,3,4,2,2,4,3,NA,NA)

smbtest <- data.frame(firm,employment,small)

smbtest %>% 
select(employment,small) %>%
group_by(small) %>%
summarise(employment = sum(employment), worksites = n(), 
        .groups = 'drop') %>% 
 mutate(employment = cumsum(employment),
     worksites = cumsum(worksites)) %>% drop_na() %>% filter(employment < 500)

I've just added two lines of syntax我刚刚添加了两行语法

  • "drop_na" “drop_na”
  • "filter(employment < 500) “过滤器(就业 < 500)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM