如何在 R Dplyr 中考虑 NA

Question

Below is the list of packages, sample data, and the script that I am running.下面是我正在运行的包、示例数据和脚本的列表。 Below that is the schema.下面是架构。 You will notice that two of the values are above 500 and therefore do not fit the schema.您会注意到其中两个值高于 500，因此不适合架构。 The desired result would only take into account those that fit the schema (employing less than 500).期望的结果将只考虑那些符合模式的（雇用少于 500 人）。 When I run this on my larger data set(not the sample data set below), I get the a result that is like what is found at the bottom.当我在更大的数据集（而不是下面的示例数据集）上运行它时，我得到的结果类似于底部的结果。 In short, how would I modify the script so that it leaves out the entries that are greater than 500 and therefore does not return a fifth row of NA?简而言之，我将如何修改脚本以便它忽略大于 500 的条目，因此不返回第五行 NA？

library(dplyr)
library(data.table)
library(odbc)
library(DBI)
library(stringr)

firm <- c("firm1","firm2","firm3","firm4","firm5","firm6","firm7","firm8","firm9","firm10","firm11")
employment <- c(1,50,90,249,499,115,145,261,210,874,1140)
small <- c(1,1,1,3,4,2,2,4,3,NA,NA)

smbtest <- data.frame(firm,employment,small)

smbsummary2<-smbtest %>% 
select(employment,small) %>%
group_by(small) %>%
summarise(employment = sum(employment), worksites = n(), 
        .groups = 'drop') %>% 
mutate(employment = cumsum(employment),
     worksites = cumsum(worksites))

smb1     >= 0 and <100
smb2     >= 0 and <150
smb3     >= 0 and <250
smb4     >= 0 and <500

smb      employment   worksites
 1           1000         20
 2           1500         22
 3           2500         25
 4           10000        29
 5           25000        NA

Answer 1

here I believe this would help在这里，我相信这会有所帮助

firm <- c("firm1","firm2","firm3","firm4","firm5","firm6","firm7","firm8","firm9","firm10","firm11")
employment <- c(1,50,90,249,499,115,145,261,210,874,1140)
small <- c(1,1,1,3,4,2,2,4,3,NA,NA)

smbtest <- data.frame(firm,employment,small)

smbtest %>% 
select(employment,small) %>%
group_by(small) %>%
summarise(employment = sum(employment), worksites = n(), 
        .groups = 'drop') %>% 
 mutate(employment = cumsum(employment),
     worksites = cumsum(worksites)) %>% drop_na() %>% filter(employment < 500)

I've just added two lines of syntax我刚刚添加了两行语法

"drop_na" “drop_na”
"filter(employment < 500) “过滤器（就业 < 500）

如何在 R Dplyr 中考虑 NA

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-03-25 22:32:54

如何在 R Dplyr 中考虑 NA

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-03-25 22:32:54

解决方案1
1 已采纳 2021-03-25 22:32:54