简体   繁体   English

cut() 中的换档中断 R function

[英]Shifting breaks in the cut() R function

I have series of values between 0 and 360 and I would like to cut them into groups several times where each time the bins shift a little.我有一系列介于 0 和 360 之间的值,我想将它们分成几组,每次垃圾箱移动一点。 I'd like to do this in R programming language.我想用 R 编程语言来做这件事。

For example:例如:

d = runif(1000, 0, 360)
dd = rnorm(1000)
l = 10
breaks <- seq(0 , 360, l)
binned <- cut(d, breaks = breaks, ordered_result = TRUE)

Next, I want to keep the bins the same size, l, but shift them by two units.接下来,我想保持箱子大小相同,l,但将它们移动两个单位。 This means that my breaks start at 2 and end at 362. However, when I cut the data my values between 0 and 2 are labeled as NA.这意味着我的休息时间从 2 开始,到 362 结束。但是,当我剪切数据时,我在 0 和 2 之间的值被标记为 NA。 This is because there is no group for them.这是因为他们没有组。 To correct this I need to make the last break, the 362 value, be same as the start of the sequence.为了纠正这个问题,我需要使最后一个中断,即 362 值,与序列的开头相同。 I was wondering how could this be done in R?我想知道如何在 R 中做到这一点?

You could conditionally add 360 to values below 2 when you apply cut the second time:当您第二次应用cut时,您可以有条件地将 360 添加到 2 以下的值:

new_binned <- cut(ifelse(d < 2, d + 360, d), breaks + 2)

This gives the correct bins with no NA values:这给出了没有NA值的正确 bin:

levels(new_binned)
#> [1] "(2,12]"    "(12,22]"   "(22,32]"   "(32,42]"   "(42,52]"   "(52,62]"  
#>![7] "(62,72]"   "(72,82]"   "(82,92]"   "(92,102]"  "(102,112]" "(112,122]"
#> [13] "(122,132]" "(132,142]" "(142,152]" "(152,162]" "(162,172]" "(172,182]"
#> [19] "(182,192]" "(192,202]" "(202,212]" "(212,222]" "(222,232]" "(232,242]"
#> [25] "(242,252]" "(252,262]" "(262,272]" "(272,282]" "(282,292]" "(292,302]"
#> [31] "(302,312]" "(312,322]" "(322,332]" "(332,342]" "(342,352]" "(352,362]"

which(is.na(new_binned))
#> integer(0)

EDIT编辑

If you want the labels to wrap back round again, and need to generalize this to any shift, you would be best writing a function to do it:如果您希望标签再次回绕,并且需要将其推广到任何班次,您最好编写一个 function 来做到这一点:

cut_wrap <- function(data, lowest = 0, highest = 360, break_every = 10) {
  
  breaks <- seq(0, highest, break_every) + lowest
  x <- cut(ifelse(data < lowest, data + highest, data), breaks)
  if(lowest == 0) lowest <- highest
  last <- sub(",.*$", paste0(", ", lowest, "]"), tail(levels(x), 1))
  levels(x) <- c(head(levels(x), -1), last)
  x
}

This allows:这允许:

d = runif(1000, 0, 360)

d2 <- cut_wrap(d, 2)

d4 <- cut_wrap(d, 4)

levels(d2)
#>  [1] "(2,12]"    "(12,22]"   "(22,32]"   "(32,42]"   "(42,52]"   "(52,62]"  
#>  [7] "(62,72]"   "(72,82]"   "(82,92]"   "(92,102]"  "(102,112]" "(112,122]"
#> [13] "(122,132]" "(132,142]" "(142,152]" "(152,162]" "(162,172]" "(172,182]"
#> [19] "(182,192]" "(192,202]" "(202,212]" "(212,222]" "(222,232]" "(232,242]"
#> [25] "(242,252]" "(252,262]" "(262,272]" "(272,282]" "(282,292]" "(292,302]"
#> [31] "(302,312]" "(312,322]" "(322,332]" "(332,342]" "(342,352]" "(352, 2]"

levels(d4)
#>  [1] "(4,14]"    "(14,24]"   "(24,34]"   "(34,44]"   "(44,54]"   "(54,64]"  
#>  [7] "(64,74]"   "(74,84]"   "(84,94]"   "(94,104]"  "(104,114]" "(114,124]"
#> [13] "(124,134]" "(134,144]" "(144,154]" "(154,164]" "(164,174]" "(174,184]"
#> [19] "(184,194]" "(194,204]" "(204,214]" "(214,224]" "(224,234]" "(234,244]"
#> [25] "(244,254]" "(254,264]" "(264,274]" "(274,284]" "(284,294]" "(294,304]"
#> [31] "(304,314]" "(314,324]" "(324,334]" "(334,344]" "(344,354]" "(354, 4]"

Created on 2022-08-25 with reprex v2.0.2使用reprex v2.0.2创建于 2022-08-25

cut(d, breaks = breaks + 2, ordered_result = TRUE) should do it. cut(d, breaks = breaks + 2, ordered_result = TRUE)应该这样做。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM