[英]Partition groups of data by group
I have the following dataset: 我有以下数据集:
df<- as.data.frame(c(rep("a", times = 9), rep("b", times = 18), rep("c", times = 27)))
colnames(df)<-"Location"
Year<-c(rep(1:3,times = 3), rep(1:6, times = 3), rep(1:9, times = 3))
df$Year<-Year
df<- df %>%
mutate(Predictor = seq_along(Location)) %>%
ungroup(df)
print(df)
Location Year Predictor
a 1 1
a 2 2
a 3 3
a 1 4
a 2 5
a 3 6
a 1 7
a 2 8
a 3 9
b 1 10
b 2 11
b 3 12
b 4 13
b 5 14
... 40 more rows
I want to split the above dataframe into training and test sets. 我想将上述数据框分为训练集和测试集。 For the test set, I want to randomly sample a third of the number of years in each Location, while keeping the years together.
对于测试集,我想在每个位置中随机抽取三分之一的年份,同时将这些年份保持在一起。 So if year "1" is selected for location "a", I want all three "1's" in the test set and so on.
因此,如果将位置“ a”选择为年份“ 1”,则我希望测试集中的所有三个“ 1”都以此类推。 My test set should look something like this:
我的测试集应如下所示:
Location Year Predictor
a 1 1
a 1 4
a 1 7
b 3 12
b 3 18
b 3 24
b 5 14
b 5 20
b 5 26
c 3 30
c 3 39
c 3 48
c 6 33
c 6 42
c 6 51
c 7 34
c 7 43
c 7 52
I found a similar question here , but this procedure would sample the same year and the same number of years from every location (and YEAR is numeric, not a factor). 我在这里找到了类似的问题,但是此过程将从每个位置采样相同的年份和相同的年数(而YEAR是数字,而不是一个因子)。 I want a different random sample of years from each location and a proportional number of samples.
我希望从每个位置获取不同的年份随机抽样,并按比例分配样本数量。
Would like to do this in dplyr if possible 如果可能,希望在dplyr中执行此操作
You can first create a distinct set of year/location combinations, then sample some of them for each location and use that in a semi_join
on the original data. 您可以先创建一组独特的年份/位置组合,然后为每个位置采样一些,然后在原始数据的
semi_join
使用它们。 This could be done as: 可以这样做:
df %>%
distinct(Location, Year) %>%
group_by(Location) %>%
sample_frac(.3) %>%
semi_join(df, .)
# Location Year Predictor
# 1 a 3 3
# 2 a 3 6
# 3 a 3 9
# 4 b 4 13
# 5 b 4 19
# 6 b 4 25
# 7 b 5 14
# 8 b 5 20
# 9 b 5 26
# 10 c 8 35
# 11 c 8 44
# 12 c 8 53
# 13 c 1 28
# 14 c 1 37
# 15 c 1 46
# 16 c 2 29
# 17 c 2 38
# 18 c 2 47
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.