简体   繁体   English

在 R 中,如何只删除满足特定条件的特定百分比的行?

[英]In R, how to drop only a certain percentage of rows that meet certain conditions?

I have a dataset that has 40% females and 60% males.我有一个包含 40% 女性和 60% 男性的数据集。 I also have a column for job role, where 85% are management.我还有一个工作角色专栏,其中 85% 是管理。 I want to drop rows randomy until I reach a maximum of 50% males and 50% with the job role management.我想随机删除行,直到我达到最多 50% 的男性和 50% 的工作角色管理。

I can find several solutions for how to drop all rows that meet those conditions, but nothing that lets me specifiy only dropping a certain number or percentage of rows.我可以找到几种解决方案来解决如何删除满足这些条件的所有行,但没有什么能让我指定只删除一定数量或百分比的行。

Can anyone suggest code that would achieve this?任何人都可以提出可以实现这一目标的代码吗?

Starting with some fake data:从一些假数据开始:

set.seed(42)
df1 <- data.frame(gender = sample(c("M", "F"), 1000, replace = TRUE, c(0.4, 0.6)),
                  role = sample(c("mgmt", "other"), 1000, replace = TRUE, c(0.85, 0.15)))

prop.table(table(df1))

#      role
#gender  mgmt other
#     F 0.529 0.094
#     M 0.324 0.053

We could look at the existing proportions and then sample using the ratio of what we want to what we have:我们可以查看现有的比例,然后使用我们想要的与我们拥有的比例进行采样:

library(dplyr)
props <- df1 %>%
  count(gender, role) %>%
  mutate(share = n / sum(n),
         desired = 0.25,
         weighting = desired/share)

df2 <- df1 %>%
  left_join(props) %>%
  slice_sample(n = 100, weight_by = weighting) %>%
  select(gender, role)

prop.table(table(df2))


      role
gender mgmt other
     F 0.23  0.22
     M 0.31  0.24

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM