[英]cut several variables, using several different number of bins for each variable
[英]Discrete bins using cut()
我想基於StartAge列的離散bin在我的dataframe age.model中繪制數據[使用點陣的xyplot()]。
我正在使用以下代碼:
# set up boundaries for intervals/bins
breaks <- c(0,3,4,5,6,8,13,15,17,18,19,20,22)
# specify interval/bin labels
labels <- c("<3", "3-4)", "4-5)","5-6)", "6-8)","8-13)", "13-15)","15-17)","17-18)","18-19)","19-20)",">=20")
# bucketing data points into bins
bins <- cut(age.model$StartAge, breaks, include.lowest = T, right=FALSE, labels=labels)
# inspect bins
summary(bins)
在cut()的第一個參數中,我指定了要離散化的列。 但是,返回的因子不包括整個DF。 我怎樣才能做到這一點?
使用dput可重現:
structure(list(Height = c(0.207224416925809, -1.19429150954007,
0.0247585682642494, 0.023546515879641, 1.51423735121426, -1.09376538778425,
-0.125209484617016, -0.63639210765747, 0.305071992864995, -0.422021082477656
), Weight = c(-0.366133564723644, -1.06969961340686, -0.0793604259237282,
-0.708230200986797, 1.71593234004357, -0.685215310472794, -1.20353653394014,
-0.490399232488568, 0.742874184424376, -0.331519044995803), Training = c(19,
27, 27, 24, 35, 23, 15, 14, 47, 7), StartAge = c(13, 19, 20,
20, 14, 2, 8, 4, 17, 18)), row.names = c("1", "2", "3", "4",
"5", "6", "7", "8", "9", "10"), class = "data.frame")
如果您使用xyplot
探索您的數據,可以考慮使用equal.count()
或shingle()
在你的代碼。 如對第一個示例所示,樂趣(毫無頭緒)的樂趣在於,對於較低的StartAge
箱,重量和高度之間的近似線性關系似乎不成立。
# Starting with data in age.model
library(lattice)
xyplot(Weight ~ Height | equal.count(StartAge), age.model, type = c("p", "r"))
equal.count
的默認箱equal.count
為6。可以輕松更改以瀏覽其他分組:
# Create four groups of equal counts to explore
xyplot(Weight ~ Height | equal.count(StartAge, 4), age.model, type = c("p", "r"))
shingle()
函數允許重疊的bin,如下所示。
# Create three groups that overlapping with each other
bins <- cbind(lower = c(0,8,16), upper = c(13,18,24))
xyplot(Weight ~ Height | shingle(StartAge, bins), age.model, type = c("p", "r"))
要將bin添加到數據框,只需在新列中對其進行影響:
age.model$bins <- bins
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.