简体   繁体   中英

Is it okay to use floating-point numbers as indices or when creating factors in R?

Is it okay to use floating-point numbers as indices or when creating factors in R?

I don't mean numbers with decimal parts; that would clearly be odd, but instead numbers which really are integers (to the user, that is), but are being stored as floating point numbers.

For example, I've often used constructs like (1:3)*3 or seq(3,9,by=3) as indices, but you'll notice that they're actually being represented as floating point numbers, not integers, even though to me, they're really integers.

Another time this could come up is when reading data from a file; if the file represents the integers as 1.0, 2.0, 3.0, etc, R will store them as floating-point numbers.

(I posted an answer below with an example of why one should be careful, but it doesn't really address if simple constructs like the above can cause trouble.)

(This question was inspired by this question , where the OP created integers to use as coding levels of a factor, but they were being stored as floating point numbers.)

It's always better to use integer representation when you can. For instance, with (1L:3L)*3L or seq(3L,9L,by=3L) .

I can come up with an example where floating representation gives an unexpected answer, but it depends on actually doing floating point arithmetic (that is, on the decimal part of a number). I don't know if storing an integer directly in floating point and possibly then doing multiplication, as in the two examples in the original post, could ever cause a problem.

Here's my somewhat forced example to show that floating points can give funny answers. I make two 3's that are different in floating point representation; the first element isn't quite exactly equal to three (on my system with R 2.13.0, anyway).

> (a <- c((0.3*3+0.1)*3,3L))
[1] 3 3
> a[1] == a[2]
[1] FALSE

Creating a factor directly works as expected because factor calls as.character on them which has the same result for both.

> as.character(a)
[1] "3" "3"
> factor(a, levels=1:3, labels=LETTERS[1:3])
[1] C C
Levels: A B C

But using it as an index doesn't work as expected because when they're forced to an integer, they are truncated, so they become 2 and 3.

> trunc(a)
[1] 2 3
> LETTERS[a]
[1] "B" "C"

Constructs such as 1:3 are really integers:

> class(1:3)
[1] "integer"

Using a float as an index entails apparently some truncation:

> foo <- 1:3
> foo
[1] 1 2 3
> foo[1.0]
[1] 1
> foo[1.5]
[1] 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM