[英]R replace values with bins
I have a df with integer values. 我有一个带整数值的df。 For purposes of classification, I'd like to replace this df with a simpler one that has pre-determined intervals instead of integers. 出于分类的目的,我想用一个具有预定间隔而不是整数的简单df替换此df。 How do I do this efficiently? 我如何有效地做到这一点? An example is below: 下面是一个示例:
df: df:
1 2 3
1 5 3 0
2 1 10 12
3 3 0 10
transforms into: 转换为:
1 2 3
1 [3-5] [3-5] [0-2]
2 [0-2] [10-12][10-12]
3 [3-5] [0-2] [10-12]
Is df
a data frame or a matrix? df
是数据帧还是矩阵? The name suggests the former, but the way you describe it suggests the latter. 该名称暗示前者,但您描述它的方式暗示后者。
If it's a matrix: 如果是矩阵:
df2 <- cut(df, c(0, 2, 5, 9 12))
dim(df2) <- dim(df)
If it's a data frame: 如果是数据框:
df[] <- lapply(df, cut, c(0, 2, 5, 9, 12))
In addition to Hong, who proposes a good solution, I found something quite useful in ggplot2: 除了Hong提出了一个好的解决方案外,我在ggplot2中发现了一些非常有用的东西:
cut_interval
- make n groups with equal range cut_interval
使n个组的范围相等
cut_number
- make n groups with approximately equal observations cut_number
使n个组的观察值大致相等
cut_width
- make n groups of equal width cut_width
使n个等宽的组
In my opinion these functions offer more flexibility and are easier to understand than the base cut function. 我认为这些功能比基本切割功能更具灵活性,更易于理解。 Note that the functions return factors instead of a matrix. 请注意,函数返回因子而不是矩阵。
You could use something like this: 您可以使用如下形式:
df <- matrix(c(5,3,0,1,10,12,3,0,10), nrow=3)
m.df <- melt(df)
m.df$value <- cut_width(m.df$value, width=2, boundary=0)
This will return 这将返回
Var1 Var2 value
1 1 1 (4,6]
2 2 1 (2,4]
3 3 1 [0,2]
4 1 2 [0,2]
5 2 2 (8,10]
6 3 2 (10,12]
7 1 3 (2,4]
8 2 3 [0,2]
9 3 3 (8,10]
If needed, you can cast it back to a square matrix: 如果需要,可以将其转换回方形矩阵:
df.bins <- acast(m.df, Var1~Var2)
Finally giving: 最后给出:
1 2 3
1 (4,6] [0,2] (2,4]
2 (2,4] (8,10] [0,2]
3 [0,2] (10,12] (8,10]
Levels: [0,2] (2,4] (4,6] (6,8] (8,10] (10,12]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.