[英]Nested if-else loops in R
I have a data frame named "crimes" which contains a "pre_rate" column that denotes the crime rate before a certain law is implemented. 我有一个名为“罪行”的数据框,其中包含一个“pre_rate”列,表示在某项法律实施之前的犯罪率。 I would like to put each rate in a "rate_category" column using a nested if-else loop.
我想使用嵌套的if-else循环将每个速率放在“rate_category”列中。 I have the following code:
我有以下代码:
crimes$rate_category =
with(crimes, ifelse(pre_rate > 0.26 && pre_rate < 0.87, 1,
ifelse(pre_rate > 1.04 && pre_rate < 1.94, 2,
ifelse(pre_rate > 2.03 && pre_rate < 2.96, 3,
ifelse(pre_rate > 3.10 && pre_rate < 3.82, 4,
ifelse(pre_rate > 4.20 && pre_rate < 11.00, 5, "NA"))))))
crimes
and here's a reproducible example: 这是一个可重复的例子:
pre_rate = c(0.27, 1.91, 2.81, 3.21, 4.80)
crimes = data.frame(pre_rate)
crimes
However, when I run the loop with my original data frame, all levels in the "rate_category" column is incorrectly set to 1. What seems to be the problem with the loop above? 但是,当我使用原始数据框运行循环时,“rate_category”列中的所有级别都被错误地设置为1.上面的循环似乎有什么问题?
Instead of nesting ifelse statements might I recommend using case_when
. 我建议使用
case_when
而不是嵌套ifelse语句。 It is a bit easier to read/follow. 阅读/遵循更容易一些。 But as @Marius mentioned your problem is the
&&
instead of using &
. 但是@Marius提到你的问题是
&&
而不是使用&
。
library(tidyverse)
crimes <- data.frame(pre_rate = c(0.27, 1.91, 2.81, 3.21, 4.80))
crimes %>%
mutate(rate_category = case_when(pre_rate > 0.26 & pre_rate < 0.87 ~ 1,
pre_rate > 1.04 & pre_rate < 1.94 ~ 2,
pre_rate > 2.03 & pre_rate < 2.96 ~ 3,
pre_rate > 3.10 & pre_rate < 3.82 ~ 4,
pre_rate > 4.20 & pre_rate < 11.00 ~ 5))
Why not define your lower bounds and upper bounds in two vectors then rely on indexing? 为什么不在两个向量中定义下界和上界然后依赖索引? Using this method, there is no need to write
pre_rate > num1 & pre_rate < num2
multiple times. 使用此方法,无需多次写入
pre_rate > num1 & pre_rate < num2
。
lowB <- c(0.26, 1.04, 2.03, 3.10, 4.2)
uppB <- c(0.87, 1.94, 2.96, 3.82, 11)
myCategory <- 1:5 ## this can be whatever categories you'd like
crimes$rate_category <- with(crimes, myCategory[pre_rate > lowB & pre_rate < uppB])
Instead of multiple nested ifelse()
, a non-equi join and update on join can be used 可以使用连接 上的非等连接和更新,而不是多个嵌套的
ifelse()
# OP's sample data set with one out-of-bounds value appended
crimes = data.frame(pre_rate = c(0.27, 1.91, 2.81, 3.21, 4.80, 1.0))
library(data.table)
# specify categories, lower, and upper bounds
bounds <- data.table(
cat = 1:5,
lower = c(0.26, 1.04, 2.03, 3.10, 4.2),
upper = c(0.87, 1.94, 2.96, 3.82, 11)
)
# non-equi join and update on join
setDT(crimes)[bounds, on = .(pre_rate > lower, pre_rate < upper), rate_category := cat][]
pre_rate rate_category 1: 0.27 1 2: 1.91 2 3: 2.81 3 4: 3.21 4 5: 4.80 5 6: 1.00 NA
Note that pre-rate
values which are outside of any of the given intervals do get a NA
rate_category
automatically. 请注意,在任何给定时间间隔之外的
pre-rate
值会自动获得NA
rate_category
。
You may use algebraic approach to solve your problem, it should be faster than your ifelse: 您可以使用代数方法来解决您的问题,它应该比您的ifelse更快:
pre_rate = c(0.27, 1.91, 2.81, 3.21, 4.80)
crimes = data.frame(pre_rate)
crimes$rate = (pre_rate > 0.26 & pre_rate < 0.87)*1 +
(pre_rate > 1.04 & pre_rate < 1.94)* 2 +
(pre_rate > 2.03 & pre_rate < 2.96)* 3 +
(pre_rate > 3.10 & pre_rate < 3.82)* 4 +
(pre_rate > 4.20 & pre_rate < 11.00)* 5
The idea here is to just get true or false values from expression, then it is getting multiplied by the number for which you see that as a category. 这里的想法是从表达式中获取true或false值,然后将它乘以您将其视为类别的数字。 The only difference would be that you won't be getting NAs here for non match instead you will get a zero, which you can off course change it.
唯一的区别是你不会在这里获得非匹配的NA而不是你会得到一个零,你可以当然改变它。 Also to add, Use "&" in cases where you want to vectorize (element by element match) your outcome as mentioned in the comments.
另外,要添加,请在需要进行矢量化(逐个元素匹配)的情况下使用“&”,如评论中所述。
Output: 输出:
#> crimes
# pre_rate rate
#1 0.27 1
#2 1.91 2
#3 2.81 3
#4 3.21 4
#5 4.80 5
If your data does not contain gaps, and you just want an index, you can use .bincode
: 如果您的数据不包含空白,并且您只需要索引,则可以使用
.bincode
:
crimes$rate_category <- .bincode(crimes$pre_rate,
breaks = c(-Inf, 1, 2, 3, 4, Inf))
If you want specific values for each interval, you can use a rolling join via the data.table
package: 如果需要每个间隔的特定值,可以通过
data.table
包使用滚动连接:
library(magrittr)
library(data.table)
rate_category_by_pre_rate <-
data.table(rate_category = c("foo", "bar", "foobar", "baz", "foobie"),
pre_rate = c(1, 2, 3, 4, 11)) %>%
setkey(pre_rate)
crimes %>%
as.data.table %>%
setkey(pre_rate) %>%
rate_category_by_pre_rate[., roll = -Inf]
#> rate_category pre_rate
#> 1: foo 0.27
#> 2: bar 1.91
#> 3: foobar 2.81
#> 4: baz 3.21
#> 5: foobie 4.80
However, in your case, you may only need ceiling
( ie round-up the value of pre_rate
and cap it at 5: 但是,在您的情况下,您可能只需要
ceiling
( 即 ,将pre_rate
的值向上pre_rate
并将其上限pre_rate
5:
crimes$rate_category <- pmin(ceiling(crimes$pre_rate), 5)
#> pre_rate rate_category
#> 1 0.27 1
#> 2 1.91 2
#> 3 2.81 3
#> 4 3.21 4
#> 5 4.80 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.