简体   繁体   English

基于 R 中另一列的类别创建一列的类别

[英]Creating categories of one column based on the categories of another in R

So I have a dataframe that looks like this:所以我有一个看起来像这样的数据框:

           x   y
1      (0,4]   1
2      (0,4]   2
3      (0,4]   3
4      (0,4]   4
5      (4,5]   5
6     (5,10]   6
7     (5,10]   7
8     (5,10]   8
9     (5,10]   9
10    (5,10]  10
11   (10,20]  11
12   (10,20]  12
13   (10,20]  13
14   (10,20]  14
15   (10,20]  15
16   (10,20]  16
17   (10,20]  17
18   (10,20]  18
19   (10,20]  19
20   (10,20]  20
21   (20,40]  21
22   (20,40]  22
23   (20,40]  23
24   (20,40]  24
25   (20,40]  25
26   (20,40]  26
27   (20,40]  27
28   (20,40]  28
29   (20,40]  29
30   (20,40]  30

And I want to partition the Y column by the irregular partitions that have categorised the x column, without going through and hard coding each specific cut off point.我想通过对 x 列进行分类的不规则分区来对 Y 列进行分区,而不需要对每个特定的截止点进行硬编码。 Is there a way of doing this?有没有办法做到这一点?

Thanks in advance提前致谢

Edit: hoped for output编辑:希望输出

         x       y
1    (0,4]   (0,4]
2    (0,4]   (0,4]
3    (0,4]   (0,4]
4    (0,4]   (0,4]
5    (4,5]   (4,5]
6   (5,10]  (5,10]
7   (5,10]  (5,10]
8   (5,10]  (5,10]
9   (5,10]  (5,10]
10  (5,10]  (5,10]
11 (10,20] (10,20]
12 (10,20] (10,20]
13 (10,20] (10,20]
14 (10,20] (10,20]
15 (10,20] (10,20]
16 (10,20] (10,20]
17 (10,20] (10,20]
18 (10,20] (10,20]
19 (10,20] (10,20]
20 (10,20] (10,20]
21 (20,40] (20,40]
22 (20,40] (20,40]
23 (20,40] (20,40]
24 (20,40] (20,40]
25 (20,40] (20,40]
26 (20,40] (20,40]
27 (20,40] (20,40]
28 (20,40] (20,40]
29 (20,40] (20,40]
30 (20,40] (20,40]

Extract numbers from existing cutpoints:从现有切点中提取数字:

library(stringr)
cutpoints = sort(as.numeric(unique(unlist(str_extract_all(df$x, pattern = "\\d+")))))

Cut using these cutpoints使用这些切割点进行切割

df$y = cut(df$y, breaks = cutpoints)

Using this reproducible data:使用这个可重现的数据:

df = structure(list(x = structure(c(1L, 1L, 1L, 1L, 4L, 5L, 5L, 5L, 
5L, 5L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L), .Label = c("(0,4]", "(10,20]", "(20,40]", 
"(4,5]", "(5,10]"), class = "factor"), y = 1:30), .Names = c("x", 
"y"), class = "data.frame", row.names = c("1", "2", "3", "4", 
"5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", 
"27", "28", "29", "30"))

We can extract the last numeric substring from 'x', convert to numeric , get the unique elements and use it as breaks in the cut我们可以提取“X”,则转换为数字最后串numeric ,得到unique元素,并用它作为breakscut

 cut(df1$y, breaks= c(0,sort(unique(as.numeric(sub(".*,(\\d+)\\D+$", "\\1", df1$x))))))
 #[1] (0,4]   (0,4]   (0,4]   (0,4]   (4,5]   (5,10]  (5,10]  (5,10]  (5,10] 
 #[10] (5,10]  (10,20] (10,20] (10,20] (10,20] (10,20] (10,20] (10,20] (10,20]
 #[19] (10,20] (10,20] (20,40] (20,40] (20,40] (20,40] (20,40] (20,40] (20,40]
 #[28] (20,40] (20,40] (20,40]
 #Levels: (0,4] (4,5] (5,10] (10,20] (20,40]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM