简体   繁体   中英

Dynamic columns in R dataframes

I'm playing with R dataframes, and trying to figure out how they work. In the sample below, I'm trying to use a 1 row data frame to de-dup the elements of a vector. I know there are much better ways to do this, like unique(), or use the hash library, etc, etc. This is more about learning how the dataframe works.

This first part works just fine, if the column name being added is a string:

> v = c(1, 2, 3, 10, 100, 50, 50, 100, 1, 2, 3, 10)
> d = data.frame(row.names=c('the row'))
> d
data frame with 0 columns and 1 rows
> for (x in v) { d[1,as.character(x)] = x}
> d
        1 2 3 10 100 50
the row 1 2 3 10 100 50

However, if I try to use a number as a column name, I get very strange behaviour:

> e = data.frame(row.names=c('the row'))
> for (x in v) { e[1,x] = x}
Error in `[<-.data.frame`(`*tmp*`, 1, x, value = 10) : 
  new columns would leave holes after existing columns
> e
        V1 V2 V3
the row  1  2  3

First of all, where did 'V1', 'V2', and 'V3' come from? Secondly, why doesn't this work? I mean, I can sort of work out that it's not happy that 10 is not the next number after 3, but other than that I don't know why this doesn't work.

Are columns only indexable as strings? Other restrictions that are worth knowing about?

Thanks in advance!

To answer where the V1, V2, V3 names come from, check the source code of '[<-.data.frame' , line 139:

> deparse(`[<-.data.frame`)[139]
[1] "                new.cols <- paste0(\"V\", seq.int(from = nvars + "

As to why doesn't e[1,10] = 10 work when e has only 3 columns, well, it simply doesn't. And if you think this contradicts your previous result ( d ), type d[,4] and see what happens.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM