[英]Select random sample of one column based on groupby of 3 columns of a R dataframe [on hold]
I need to select a random sample of one column of a R dataframe by grouping by on three other columns.我需要 select R dataframe 一列的随机样本,方法是在其他三列上分组。 This is some what similar to what has been discussed below:
这与下面讨论的内容类似:
Groupby and Sample pandas Groupby 和样品 pandas
and I do not know how to replicate in the Python code in R.而且我不知道如何在 R 中的 Python 代码中复制。
My bad, I haven't posted what i tried so far.我的错,到目前为止,我还没有发布我尝试过的内容。 I used data.table package.
我使用了 data.table package。
library(data.table)
sample_df <- df[, .SD[sample(x = .N, size = 50)], by = id]
However, I am not sure how to sample one column by using 3 other columns as groupby但是,我不确定如何通过使用其他 3 列作为 groupby 来对一列进行采样
Added sample masked data添加了样本掩码数据
df:东风:
col1 col2 col3 col4
A1 ABC 1234 H
A1 ABC 1234 O2
A1 ABC 1234 N
B1 DEF 7787J C
B1 DEF 7787J CA
C1 HIJ 8989 CL
target df:目标df:
col1 col2 col3 col4
A1 ABC 1234 H or O2 or N
A1 ABC 1234 H or O2 or N
B1 DEF 7787J C
B1 DEF 7787J CA
C1 HIJ 8989 CL
Base R solution:基础 R 解决方案:
sample_df <- do.call("rbind", lapply(split(df, df$Position), function(x){if(nrow(x) > 1){sample(x)}else{x}}))
Data:数据:
df <- structure(list(Name = structure(c(4L, 1L, 2L, 6L, 3L, 5L, 4L, 1L, 2L, 3L, 5L, 4L, 1L, 2L, 6L, 3L, 5L, 2L, 6L, 3L, 5L),
.Label = c("Bob", "Dave", "Fred", "Jim", "Ray", "Steve"),
class = "factor"), Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("2019-10-19", "2019-10-20", "2019-10-21", "2019-10-22"),
class = "factor"), Load = c(900L, 900L, 900L, 850L, 850L, 850L, 789L, 789L, 789L, 960L,
960L, 909L, 909L, 909L, 991L, 991L, 991L, 720L, 717L, 717L, 717L),
Position = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L),
.Label = c("Defense", "Forward"), class = "factor")), row.names = c(NA, -21L), class = "data.frame")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.