简体   繁体   中英

data.matrix() modifies first column of the data frame in R

I have a data frame like so:

>df
         classA  classB  classC  classD
item1         0       0      34       6
item2         2      12     267      12
item3        45      26       3    5876
item4        23     110     674      17
item5         1      14      98      17
>class(df)
[1] "data.frame"
>typeof(df)
[1] "list"
>is.factor(df)
[1] FALSE

When I convert it to a numeric matrix (to do some operations on it), values of the first column (only) are changed.

>data.matrix(df)
          classA  classB  classC  classD
 item1         1       0      34       6
 item2         3      12     267      12
 item3        59      26       3    5876
 item4        34     110     674      17
 item5         2      14      98      17

I don't get it. Where do these numbers come from? How can I convert the data frame to a numeric matrix properly?

You should use as.matrix :

> df
         ClassA ClassB ClassC ClassD
    1      0      0     34      6
    2      2     12    267     12
    3     45     26      3   5876
    4     23    110    674     17
    5      1     98     98     17
 > as.matrix(df)
       ClassA ClassB ClassC ClassD
[1,]      0      0     34      6
[2,]      2     12    267     12
[3,]     45     26      3   5876
[4,]     23    110    674     17
[5,]      1     98     98     17
> class(as.matrix(df))
[1] "matrix"

I would guess that the first column of df is a factor (you can check by typing is.factor(df[,1]) ). The function data.matrix returns the internal values of factors. That is why you get different numbers.

One way to circumvent this is to transform the first column into a numeric column first, or use as.matrix instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM