简体   繁体   English

base::cbind 和 dplyr::bind_cols 之间的不同行为

[英]Different behaviours between base::cbind and dplyr::bind_cols

When combining a data frame and a vector with different number of rows/lengths, bind_cols gives an error, whereas cbind repeats rows – why is this?当组合数据框和具有不同行数/长度的向量时, bind_cols会出错,而cbind重复行——这是为什么呢?

(And is it really wise to have that as a default behavior of cbind ?) (将其作为cbind的默认行为真的明智吗?)

See example data below.请参阅下面的示例数据。

# Example data
x10 <- c(1:10)
y10 <- c(1:10)
xy10 <- tibble(x10, y10)

z20 <- c(1:20)

# get an error
xyz20 <- dplyr::bind_cols(xy10, z20)

# why does cbind repeat rows of xy10 to suit z20?
xyz20 <- cbind(xy10, z20)
xyz20

base::cbind is a generic function. base::cbind是一个通用函数。 Its behavior is different for matrix and data frames.它的行为对于矩阵和数据帧是不同的。

For matrices, it does warn if objects have different number of rows (see more on Note below).对于矩阵,如果对象具有不同的行数,它会发出警告(请参阅下面的注释更多信息)。

cbind(as.matrix(xy10), z20)
#      x10 y10 z20
# [1,]   1   1   1
# [2,]   2   2   2
# [3,]   3   3   3
# [4,]   4   4   4
# [5,]   5   5   5
# [6,]   6   6   6
# [7,]   7   7   7
# [8,]   8   8   8
# [9,]   9   9   9
#[10,]  10  10  10
#Warning message:
#In cbind(as.matrix(xy10), z20) :
#  number of rows of result is not a multiple of vector length (arg 2)

But for data frames, it actually creates a data frame from scratch.但对于数据帧,它实际上是从头开始创建数据帧。 So the following is identical, both giving a data frame of 20 rows:所以以下是相同的,都给出了 20 行的数据框:

cbind(xy10, z20)

## in this way, R's recycling rule steps in
data.frame(xy10[, 1], xy10[, 2], z20)

From ?cbind :?cbind

The 'cbind' data frame method is just a wrapper for 'data.frame(..., check.names = FALSE)'. 'cbind' 数据框方法只是 'data.frame(..., check.names = FALSE)' 的包装。 This means that it will split matrix columns in data frame arguments , and convert character columns to factors unless 'stringsAsFactors = FALSE' is specified.这意味着它将拆分数据框参数中的矩阵列,并将字符列转换为因子,除非指定了“stringsAsFactors = FALSE”。


Note : In non-data.frame cases, matrices are not allowed to grow bigger.注意:在非 data.frame 的情况下,矩阵不允许变大。 Only vectors will be recycled or truncated.只有向量将被回收或截断。

## handling two vectors
## vector of shorter length is recycled
cbind(1:2, 1:4)
#     [,1] [,2]
#[1,]    1    1
#[2,]    2    2
#[3,]    1    3
#[4,]    2    4

## handling two matrices
## has strict requirement on dimensions
cbind(as.matrix(1:2), as.matrix(1:4))
#Error in cbind(as.matrix(1:2), as.matrix(1:4)) : 
#  number of rows of matrices must match (see arg 2)

## handling a matrix and a vector
## vector of shorter length is recycled
cbind(1:2, as.matrix(1:4))
#     [,1] [,2]
#[1,]    1    1
#[2,]    2    2
#[3,]    1    3
#[4,]    2    4

## handling a matrix and a vector
## vector of longer length is truncated
cbind(as.matrix(1:2), 1:4)
#     [,1] [,2]
#[1,]    1    1
#[2,]    2    2
#Warning message:
#In cbind(1:4, as.matrix(1:2)) :
#  number of rows of result is not a multiple of vector length (arg 1)

From ?cbind :?cbind

If there are several matrix arguments, they must all have the same number of rows....如果有多个矩阵参数,它们必须都具有相同的行数......

If all the arguments are vectors, ..., values in shorter arguments are recycled to achieve this length...如果所有参数都是向量,...,较短参数中的值将被回收以达到此长度...

When the arguments consist of a mix of matrices and vectors, the number of rows of the result is determined by the number of rows of the matrix arguments... vectors... are recycled or subsetted to achieve this length.当参数由矩阵和向量的混合组成时,结果的行数由矩阵参数的行数决定...向量...被回收或子集化以达到此长度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM