简体   繁体   中英

How can I split a data.frame into two equal halves?

如何仅使用基本函数在 R 中将 data.frame 分成两半?

The following will split a data.frame into two equal halves in R using base functions only

# Where df is your data frame

n<-nrow(df)/2

n1<-(nrow(df)/2)+1

n2<-nrow(df)

df1<-df[1:n,]

df2<-df[n1:n2,]

rm(n,
   n1,
   n2)

This will not sort your data by any particular criterion, nor will it split randomly by row number. It is useful if you need to split your data (eg, into a Derivation and Validation cohort) and are confident that the data are not systematically arranged by row in a way that would disrupt your analyses.

Another base option:

n <- nrow(data)/2
n_rows <- nrow(data)
split(data, 
      rep(1:ceiling(n_rows/n), 
      each=n, 
      length.out=n_rows))

Output:

$`1`
  v1 v2
1  A  1
2  A  2

$`2`
  v1 v2
3  B  3
4  B  4

Data

data <- data.frame(v1 = c("A", "A", "B", "B"),
                   v2 = c(1,2,3,4))

You could use split() + cut() .

split(data, cut(1:nrow(data), 2, labels = FALSE))

Example
split(sleep, cut(1:nrow(sleep), 2, labels = FALSE))
$`1`
   extra group ID
1    0.7     1  1
2   -1.6     1  2
3   -0.2     1  3
4   -1.2     1  4
5   -0.1     1  5
6    3.4     1  6
7    3.7     1  7
8    0.8     1  8
9    0.0     1  9
10   2.0     1 10

$`2`
   extra group ID
11   1.9     2  1
12   0.8     2  2
13   1.1     2  3
14   0.1     2  4
15  -0.1     2  5
16   4.4     2  6
17   5.5     2  7
18   1.6     2  8
19   4.6     2  9
20   3.4     2 10

Second data.frame will get the extra row if there are odd numbers.

index <- seq.int(nrow(df) / 2)

df[index, ]
df[-index, ]

Easily expandable if you want to change how you split it. For example, to randomly draw the rows use sample() .

index <- sample(seq.int(nrow(df)), nrow(df) / 2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM