[英]Convert numeric values into binary (0/1)
I have a data frame with counts of different kinds of fruits of different people.我有一个数据框,其中包含不同人的不同种类水果的数量。 Like below
像下面
apple banana orange
Tim 3 0 2
Tom 0 1 1
Bob 1 2 2
How can I change it into a binary matrix, ie if a person has at least one fruit, no matter how many he has, then the I record 1, if not, record 0. Like below我怎样才能把它变成一个二进制矩阵,即如果一个人至少有一个水果,不管他有多少,那么我记录 1,如果没有,记录 0。如下所示
apple banana orange
Tim 1 0 1
Tom 0 1 1
Bob 1 1 1
Here's your data.frame
:这是你的
data.frame
:
x <- structure(list(apple = c(3L, 0L, 1L), banana = 0:2, orange = c(2L,
1L, 2L)), .Names = c("apple", "banana", "orange"), class = "data.frame", row.names = c("Tim",
"Tom", "Bob"))
And your matrix:还有你的矩阵:
as.matrix((x > 0) + 0)
apple banana orange
Tim 1 0 1
Tom 0 1 1
Bob 1 1 1
I had no idea that a quick pre-bedtime posting would generate any discussion , but the discussions themselves are quite interesting, so I wanted to summarize here:我不知道睡前快速发帖会引起任何讨论,但讨论本身很有趣,所以我想在这里总结一下:
My instinct was to simply take the fact that underneath a TRUE
and FALSE
in R, are the numbers 1
and 0
.我的直觉是简单地接受一个事实,即在 R 中的
TRUE
和FALSE
之下,是数字1
和0
。 If you try (a not so good way) to check for equivalence, such as 1 == TRUE
or 0 == FALSE
, you'll get TRUE
.如果您尝试(一种不太好的方法)检查等效性,例如
1 == TRUE
或0 == FALSE
,您将得到TRUE
。 My shortcut way (which turns out to take more time than the correct , or at least more conceptually correct way) was to just add 0
to my TRUE
s and FALSE
s, since I know that R would coerce the logical vectors to numeric.我的快捷方式(这将会需要更多的时间比正确的,或者至少在概念上更正确的方式)是只需添加
0
到我的TRUE
S和FALSE
S,因为我知道,R.将强制逻辑向量数字。
The correct, or at least, more appropriate way, would be to convert the output using as.numeric
(I think that's what @JoshO'Brien intended to write).正确的,或者至少是更合适的方法是使用
as.numeric
转换输出(我认为这就是@JoshO'Brien 打算写的)。 BUT.... unfortunately, that removes the dimensional attributes of the input, so you need to re-convert the resulting vector to a matrix, which, as it turns out, is still faster than adding 0
as I did in my answer.但是……不幸的是,这会删除输入的维度属性,因此您需要将结果向量重新转换为矩阵,事实证明,这仍然比我在答案中所做的添加
0
快。
Having read the comments and criticisms, I thought I would add one more option---using apply
to loop through the columns and use the as.numeric
approach.阅读了评论和批评后,我想我会再添加一个选项——使用
apply
循环遍历列并使用as.numeric
方法。 That is slower than manually re-creating the matrix, but slightly faster than adding 0
to the logical comparison.这比手动重新创建矩阵要慢,但比在逻辑比较中添加
0
稍快。
x <- data.frame(replicate(1e4,sample(0:1e3)))
library(rbenchmark)
benchmark(X1 = {
x1 <- as.matrix((x > 0) + 0)
},
X2 = {
x2 <- apply(x, 2, function(y) as.numeric(y > 0))
},
X3 = {
x3 <- as.numeric(as.matrix(x) > 0)
x3 <- matrix(x3, nrow = 1001)
},
X4 = {
x4 <- ifelse(x > 0, 1, 0)
},
columns = c("test", "replications", "elapsed",
"relative", "user.self"))
# test replications elapsed relative user.self
# 1 X1 100 116.618 1.985 110.711
# 2 X2 100 105.026 1.788 94.070
# 3 X3 100 58.750 1.000 46.007
# 4 X4 100 382.410 6.509 311.567
all.equal(x1, x2, check.attributes=FALSE)
# [1] TRUE
all.equal(x1, x3, check.attributes=FALSE)
# [1] TRUE
all.equal(x1, x4, check.attributes=FALSE)
# [1] TRUE
Thanks for the discussion y'all!谢谢大家的讨论!
我通常使用这种方法:
df[df > 0] = 1
use can use ifelse
.使用可以使用
ifelse
。 It should work on both matrix as well as dataframe however, resultant value will be matrix它应该适用于矩阵和数据帧,但是,结果值将是矩阵
> df <- cbind(aaple = c(3, 0 , 1), banana = c(0, 1, 2), orange = c(2, 1, 2))
> df
aaple banana orange
[1,] 3 0 2
[2,] 0 1 1
[3,] 1 2 2
> ifelse(df>0, 1, 0)
aaple banana orange
[1,] 1 0 1
[2,] 0 1 1
[3,] 1 1 1
Just use a comparison:只需使用比较:
d = t(matrix(c(3,0,2,0,1,1,1,2,2), 3))
d > 0
t(matrix(as.numeric(d>0), ncol(d)))
> pippo
person apple banana orange
1 Tim 1 0 2
2 Tom 0 1 1
3 Bob 1 2 2
> cols <- c("apple", "banana", "orange")
> lapply(cols, function(x) {pippo[,x] <<- as.numeric(pippo[,x] >= 1)})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.