简体   繁体   中英

How to convert numeric values to binary (using a threshold) in r?

I would like to know how it's possible to convert data frame values in r from numeric into binary.

data frame:

> head(predictionDB)
  TargetVar   X1         X2 X3        X4         X5         X6         X7        X8        X9       X10 X11       X12       X13
1       0 0.00 0.00000000  0 0.0000000 0.00000000 0.06666667 0.06666667 0.0000000 0.0000000 0.0000000 0.0 0.0000000 0.4666667
2       0 0.00 0.00000000  0 0.1212121 0.09090909 0.00000000 0.00000000 0.0000000 0.0000000 0.1818182 0.0 0.2727273 0.1818182
3       0 0.00 0.00000000  0 0.0000000 0.00000000 0.00000000 0.00000000 0.0000000 1.0000000 0.0000000 0.0 0.0000000 0.0000000
4       0 0.25 0.00000000  0 0.0000000 0.00000000 0.25000000 0.00000000 0.0000000 0.0000000 0.0000000 0.0 0.2500000 0.0000000
5       0 0.00 0.09090909  0 0.0000000 0.04545455 0.04545455 0.00000000 0.2727273 0.2272727 0.0000000 0.0 0.0000000 0.3181818
6       1 0.10 0.00000000  0 0.0000000 0.00000000 0.00000000 0.00000000 0.0000000 0.5000000 0.0000000 0.1 0.3000000 0.0000000

Target:

> head(predictionDB)
  TargetVar   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13
1       0      0  0  0  0  0  1  1  0  0   0   0   0   1
2 ...

Many thanks in advance!

You can do:

data.frame(df[1], (df[-1] > 0) * 1)

  TargetVar X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13
1         0  0  0  0  0  0  1  1  0  0   0   0   0   1
2         0  0  0  0  1  1  0  0  0  0   1   0   1   1
3         0  0  0  0  0  0  0  0  0  1   0   0   0   0
4         0  1  0  0  0  0  1  0  0  0   0   0   1   0
5         0  0  1  0  0  1  1  0  1  1   0   0   0   1
6         1  1  0  0  0  0  0  0  0  1   0   1   1   0

Here are 5 ways.

First:

predictionDB[-1] <- +(predictionDB[-1] > 0)

Second:

predictionDB[-1] <- (predictionDB[-1] > 0) + 0L

Third:

predictionDB[-1] <- (predictionDB[-1] > 0)*1L

Fourth:

predictionDB[-1] <- as.integer(predictionDB[-1] > 0)

Fifth:

predictionDB[-1] <- ifelse(predictionDB[-1] > 0, 1, 0)

Once I run tests and the first seemed the fastest by a small difference. But this is only true with large data sets.
The 5th, ifelse , is consistently slower, with small or large data sets.

Data.

predictionDB <- read.table(text = "
TargetVar   X1         X2 X3        X4         X5         X6         X7        X8        X9       X10 X11       X12       X13
1       0 0.00 0.00000000  0 0.0000000 0.00000000 0.06666667 0.06666667 0.0000000 0.0000000 0.0000000 0.0 0.0000000 0.4666667
2       0 0.00 0.00000000  0 0.1212121 0.09090909 0.00000000 0.00000000 0.0000000 0.0000000 0.1818182 0.0 0.2727273 0.1818182
3       0 0.00 0.00000000  0 0.0000000 0.00000000 0.00000000 0.00000000 0.0000000 1.0000000 0.0000000 0.0 0.0000000 0.0000000
4       0 0.25 0.00000000  0 0.0000000 0.00000000 0.25000000 0.00000000 0.0000000 0.0000000 0.0000000 0.0 0.2500000 0.0000000
5       0 0.00 0.09090909  0 0.0000000 0.04545455 0.04545455 0.00000000 0.2727273 0.2272727 0.0000000 0.0 0.0000000 0.3181818
6       1 0.10 0.00000000  0 0.0000000 0.00000000 0.00000000 0.00000000 0.0000000 0.5000000 0.0000000 0.1 0.3000000 0.0000000
", header = TRUE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM