简体   繁体   English

将列添加到 R 中的空数据框

[英]Add Columns to an empty data frame in R

I have searched extensively but not found an answer to this question on Stack Overflow.我进行了广泛的搜索,但没有在 Stack Overflow 上找到这个问题的答案。

Lets say I have a data frame a.假设我有一个数据框 a。

I define:我定义:

a <- NULL
a <- as.data.frame(a)

If I wanted to add a column to this data frame as so:如果我想像这样向这个数据框添加一列:

a$col1 <- c(1,2,3)

I get the following error:我收到以下错误:

Error in `$<-.data.frame`(`*tmp*`, "a", value = c(1, 2, 3)) : 
    replacement has 3 rows, data has 0

Why is the row dimension fixed but the column is not?为什么行维度是固定的而列不是?

How do I change the number of rows in a data frame?如何更改数据框中的行数?

If I do this (inputting the data into a list first and then converting to a df), it works fine:如果我这样做(首先将数据输入列表然后转换为 df),它工作正常:

a <- NULL
a$col1 <- c(1,2,3)
a <- as.data.frame(a)

The row dimension is not fixed, but data.frames are stored as list of vectors that are constrained to have the same length. 行维度不固定,但data.frames存储为受限制为具有相同长度的向量列表。 You cannot add col1 to a because col1 has three values (rows) and a has zero, thereby breaking the constraint. 您不能将col1添加到a因为col1具有三个值(行)且a具有零,从而破坏了约束。 R does not by default auto-vivify values when you attempt to extend the dimension of a data.frame by adding a column that is longer than the data.frame. 当您尝试通过添加比data.frame更长的列来扩展data.frame的维度时,R不会默认自动生成值。 The reason that the second example works is that col1 is the only vector in the data.frame so the data.frame is initialized with three rows. 第二个示例的工作原因是col1是data.frame中唯一的向量,因此data.frame初始化为三行。

If you want to automatically have the data.frame expand, you can use the following function: 如果要自动展开data.frame,可以使用以下函数:

cbind.all <- function (...) 
{
    nm <- list(...)
    nm <- lapply(nm, as.matrix)
    n <- max(sapply(nm, nrow))
    do.call(cbind, lapply(nm, function(x) rbind(x, matrix(, n - 
        nrow(x), ncol(x)))))
}

This will fill missing values with NA . 这将使用NA填充缺失值。 And you would use it like: cbind.all( df, a ) 你可以使用它: cbind.all( df, a )

You could also do something like this where I read in data from multiple files, grab the column I want, and store it in the dataframe. 您还可以执行以下操作:我从多个文件中读取数据,获取所需的列,并将其存储在数据框中。 I check whether the dataframe has anything in it, and if it doesn't, create a new one rather than getting the error about mismatched number of rows: 我检查数据框中是否有任何内容,如果没有,请创建一个新的,而不是获得有关行数不匹配的错误:

readCounts = data.frame()

for(f in names(files)){
    d = read.table(files[f], header=T, as.is=T)
    d2 = round(data.frame(d$NumReads))
    colnames(d2) = f
    if(ncol(readCounts) == 0){
        readCounts = d2
        rownames(readCounts) = d$Name
    } else{
        readCounts = cbind(readCounts, d2)
    }
}

if you have an empty dataframe, called for example df, in my opinion another quite simple solution is the following:如果你有一个空的 dataframe,例如 df,我认为另一个非常简单的解决方案如下:

df[1,]=NA  # ad a temporary new row of NA values
df[,'new_column'] = NA # adding new column, called for example 'new_column'
df = df[0,] # delete row with NAs

I hope this may help.我希望这可能有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM