[英]Creating categories of one column based on the categories of another in R
So I have a dataframe that looks like this:所以我有一个看起来像这样的数据框:
x y
1 (0,4] 1
2 (0,4] 2
3 (0,4] 3
4 (0,4] 4
5 (4,5] 5
6 (5,10] 6
7 (5,10] 7
8 (5,10] 8
9 (5,10] 9
10 (5,10] 10
11 (10,20] 11
12 (10,20] 12
13 (10,20] 13
14 (10,20] 14
15 (10,20] 15
16 (10,20] 16
17 (10,20] 17
18 (10,20] 18
19 (10,20] 19
20 (10,20] 20
21 (20,40] 21
22 (20,40] 22
23 (20,40] 23
24 (20,40] 24
25 (20,40] 25
26 (20,40] 26
27 (20,40] 27
28 (20,40] 28
29 (20,40] 29
30 (20,40] 30
And I want to partition the Y column by the irregular partitions that have categorised the x column, without going through and hard coding each specific cut off point.我想通过对 x 列进行分类的不规则分区来对 Y 列进行分区,而不需要对每个特定的截止点进行硬编码。 Is there a way of doing this?有没有办法做到这一点?
Thanks in advance提前致谢
Edit: hoped for output编辑:希望输出
x y
1 (0,4] (0,4]
2 (0,4] (0,4]
3 (0,4] (0,4]
4 (0,4] (0,4]
5 (4,5] (4,5]
6 (5,10] (5,10]
7 (5,10] (5,10]
8 (5,10] (5,10]
9 (5,10] (5,10]
10 (5,10] (5,10]
11 (10,20] (10,20]
12 (10,20] (10,20]
13 (10,20] (10,20]
14 (10,20] (10,20]
15 (10,20] (10,20]
16 (10,20] (10,20]
17 (10,20] (10,20]
18 (10,20] (10,20]
19 (10,20] (10,20]
20 (10,20] (10,20]
21 (20,40] (20,40]
22 (20,40] (20,40]
23 (20,40] (20,40]
24 (20,40] (20,40]
25 (20,40] (20,40]
26 (20,40] (20,40]
27 (20,40] (20,40]
28 (20,40] (20,40]
29 (20,40] (20,40]
30 (20,40] (20,40]
Extract numbers from existing cutpoints:从现有切点中提取数字:
library(stringr)
cutpoints = sort(as.numeric(unique(unlist(str_extract_all(df$x, pattern = "\\d+")))))
Cut using these cutpoints使用这些切割点进行切割
df$y = cut(df$y, breaks = cutpoints)
Using this reproducible data:使用这个可重现的数据:
df = structure(list(x = structure(c(1L, 1L, 1L, 1L, 4L, 5L, 5L, 5L,
5L, 5L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L), .Label = c("(0,4]", "(10,20]", "(20,40]",
"(4,5]", "(5,10]"), class = "factor"), y = 1:30), .Names = c("x",
"y"), class = "data.frame", row.names = c("1", "2", "3", "4",
"5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26",
"27", "28", "29", "30"))
We can extract the last numeric substring from 'x', convert to numeric
, get the unique
elements and use it as breaks
in the cut
我们可以提取“X”,则转换为数字最后串numeric
,得到unique
元素,并用它作为breaks
的cut
cut(df1$y, breaks= c(0,sort(unique(as.numeric(sub(".*,(\\d+)\\D+$", "\\1", df1$x))))))
#[1] (0,4] (0,4] (0,4] (0,4] (4,5] (5,10] (5,10] (5,10] (5,10]
#[10] (5,10] (10,20] (10,20] (10,20] (10,20] (10,20] (10,20] (10,20] (10,20]
#[19] (10,20] (10,20] (20,40] (20,40] (20,40] (20,40] (20,40] (20,40] (20,40]
#[28] (20,40] (20,40] (20,40]
#Levels: (0,4] (4,5] (5,10] (10,20] (20,40]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.