R代碼將數據拆分為大小相等的不同樣本

Question

我在編寫正確的R代碼以從數據集中獲取4個大小相等的不同樣本時遇到麻煩。

需要你的幫助！

感謝和問候，Reelina

Answer 1

這實際上取決於您的目標是什么，您可能想在這里嘗試什么。 我將假設給定一個數據框，您想要創建四個大小相等的子集，其中每個子集都是數據的隨機采樣的四分之一。

出於演示目的，我使用了基本R中包含的Seatbelts數據，因為該行的行數是4的倍數。此解決方案僅使用基本R函數。 對於更多涉及的數據幀操作，我建議查看dplyr軟件包。

# use seat belts data as example as it has nrow(x) %% 4 == 0
data(Seatbelts)
# generate a random sample of numbers 1:4 such that each occurs equally
ind = sample(rep(1:4,each = nrow(Seatbelts)/4))
# you could add that as a column to your data frame allowing the groups to be
# specified in formulae etc
# or if you want the four subsets
lapply(split(1:nrow(Seatbelts),ind), function(i) Seatbelts[i,])

如果您的數據是矢量，那么這會更容易

x = runif(24)
ind = sample(rep(1:4,each = length(x)/4))
split(x,ind)

如果您不希望隨機抽樣，則只需將ind創建為

ind = rep(1:4,each = length(x)/4)

並以與以前相同的方式拆分。

您應該小心使用cut東西，因為這樣不一定會給您4個相等大小的子集。

table(as.numeric(cut(x,4)))

# 1 2 3 4 
# 7 6 3 8

這是因為cut將x的范圍cut為間隔而不是長度。

Answer 2

這種方法怎么樣？

# Create data for example
x <- data.frame(id = 1:100, y = rnorm(100), z = rnorm(100))

# Returns a list with four equally sized distinct samples of the data
lapply(split(sample(nrow(x)), ceiling((1:nrow(x))/25)), function(i) x[i, ])

Answer 3

一個可以使用cut命令：

x<-1:100
cutindex<-cut(x, breaks=4)

要重命名切割點，請使用“級別”命令：

levels(cutindex)<-c("A", "B", "C", "D")

一旦數據被剪切，我建議使用dplyr軟件包中的group_by命令進行其他分析。

R代碼將數據拆分為大小相等的不同樣本

問題描述

3 個解決方案

解決方案1
1 2016-05-05 18:00:26

解決方案2
0 2016-05-05 11:53:50

解決方案3
0 2016-05-05 14:49:20

R代碼將數據拆分為大小相等的不同樣本

問題描述

3 個解決方案

解決方案1 1 2016-05-05 18:00:26

解決方案2 0 2016-05-05 11:53:50

解決方案3 0 2016-05-05 14:49:20

解決方案1
1 2016-05-05 18:00:26

解決方案2
0 2016-05-05 11:53:50

解決方案3
0 2016-05-05 14:49:20