将组列添加到数据框

Question

I have a dataframe df like this 我有这样的数据帧df

      V1 V2
1  13219  0
2   6358  1
3   4384  2
4   3359  3
5   2820  4
6   2466  5
7   2144  6
8   1941  7
9   1778  8
10  1550  9

and I would like to add a " group " column which will correspond to different value of df$V2 . 我想添加一个“ group ”列，它对应于df$V2不同值。

At df$V2 = 0 group will be A
At df$V2 >0 <=5 group will be B
At df$V2 >= 6 group will be C

The idea would be to obtain something like this: 想法是获得这样的东西：

      V1 V2 Grp
1  13219  0 A
2   6358  1 B
3   4384  2 B
4   3359  3 B
5   2820  4 B
6   2466  5 B
7   2144  6 C
8   1941  7 C
9   1778  8 C
10  1550  9 C

This seems straight toward at first, but googling around doesn't help much. 这开始似乎是直接的，但谷歌搜索并没有多大帮助。 Advices much appreciated. 建议非常感谢。

Answer 1

You could use cut or findInterval 你可以使用cut或findInterval

df$Grp <- with(df, LETTERS[1:3][cut(V2, breaks=c(-Inf,0, 5, Inf),
            labels=FALSE)])

df$Grp <-  with(df, LETTERS[1:3][findInterval(V2, c(-Inf,0, 5,Inf)+1)])

df
#      V1 V2 Grp
#1  13219  0   A
#2   6358  1   B
#3   4384  2   B
#4   3359  3   B
#5   2820  4   B
#6   2466  5   B
#7   2144  6   C
#8   1941  7   C
#9   1778  8   C
#10  1550  9   C

Or 要么

 with(df, LETTERS[c(2,1,3)][1+(V2==0) + 2*(V2 >=6)])
 #[1] "A" "B" "B" "B" "B" "B" "C" "C" "C" "C"

Answer 2

Using indexing, this can be done quite easily: 使用索引，这可以很容易地完成：

df$group <- NA
df$group[df$V2 == 0] <- "A"
df$group[df$V2 > 0]  <- "B"
df$group[df$V2 >= 6] <- "C"

Note that the 3rd and 4th statements must be run in that sequence. 请注意，必须按顺序运行第3和第4个语句。 Otherwise -- if you didn't want to have to run the "C" assignations after the "B" assignations, you'd need to define the indexing for the "B" assignations more thoroughly: 否则 - 如果您不想在“B”分配后运行“C”分配，则需要更彻底地定义“B”分配的索引：

df$group[df$V2 > 0 & df$V2 < 6] <- "B"

Results 结果

      V1 V2 group
1  13219  0     A
2   6358  1     B
3   4384  2     B
4   3359  3     B
5   2820  4     B
6   2466  5     B
7   2144  6     C
8   1941  7     C
9   1778  8     C
10  1550  9     C

Data 数据

df <- read.csv(text="V1,V2
13219,0
6358,1
4384,2
3359,3
2820,4
2466,5
2144,6
1941,7
1778,8
1550,9")

将组列添加到数据框

问题描述

2 个解决方案

解决方案1
4 已采纳 2015-03-26 08:56:06

解决方案2
3 2015-03-26 09:28:30

将组列添加到数据框

问题描述

2 个解决方案

解决方案1 4 已采纳 2015-03-26 08:56:06

解决方案2 3 2015-03-26 09:28:30

解决方案1
4 已采纳 2015-03-26 08:56:06

解决方案2
3 2015-03-26 09:28:30