将数据拆分为在标签上分层的训练和测试

Question

I have a data frame (df) with two columns (Numbers and Letters).我有一个包含两列（数字和字母）的数据框 (df)。 See reproducible example:请参阅可重现的示例：

Numbers<- c(2.370653,3.811336,5.255120, 6.501197,7.848100,9.343938,10.843479,12.164387,13.476807,14.922644,16.419281,17.664224,19.112835,20.660367,21.962732,23.213675)
Letters<-c("a","b","c","c","d","a","b","d","d","a","a","c","b","c","c","c")
df <- as.data.frame(cbind(Numbers,Letters))

I want randomly to split the data frame into two date frames of equal size and with the same number of Letters in each.我想随机所述数据帧分成两个同等大小的日期帧和在每个相同数量快报。 I have found the stratified() function that takes a sample with 50% of each of the Letters:我发现了对每个字母的 50% 进行采样的分层（）函数：

test <- stratified(df, "Letters", .5)

But this is not really the same as splitting the data frame into two data frames.但这与将数据帧拆分为两个数据帧实际上并不相同。 I do not want any of the same values from df$Numbers in the two data frames - just the same amount of df$Letters in each.我不想要两个数据帧中 df$Numbers 的任何相同值 - 每个数据帧中的 df$Letters 数量相同。 Can you help me?你能帮助我吗？

Answer 1

Try this approach with rsample which is close to what you want.用接近你想要的rsample尝试这种方法。 And the comment of @AllanCameron is totally valid, you can split three into two pieces of 1.5 for each sample:并且@AllanCameron的评论是完全有效的，您可以将每个样本的三个 1.5 分成两部分：

library(rsample)
#Code
set.seed(123)
split_strat <- initial_split(df, prop = 0.5,
                             strata = 'Letters')
train_strat <- training(split_strat)
test_strat <- testing(split_strat)

Check for proportions:检查比例：

table(train_strat$Letters)

a b c d 
2 2 3 2 

table(test_strat$Letters)

a b c d 
2 1 3 1

将数据拆分为在标签上分层的训练和测试

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-10-20 20:11:47

将数据拆分为在标签上分层的训练和测试

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-10-20 20:11:47

解决方案1
0 已采纳 2020-10-20 20:11:47