[英]Add a group column to a dataframe
I have a dataframe df
like this 我有这样的数据帧df
V1 V2
1 13219 0
2 6358 1
3 4384 2
4 3359 3
5 2820 4
6 2466 5
7 2144 6
8 1941 7
9 1778 8
10 1550 9
and I would like to add a " group
" column which will correspond to different value of df$V2
. 我想添加一个“ group
”列,它对应于df$V2
不同值。
At df$V2 = 0 group will be A
At df$V2 >0 <=5 group will be B
At df$V2 >= 6 group will be C
The idea would be to obtain something like this: 想法是获得这样的东西:
V1 V2 Grp
1 13219 0 A
2 6358 1 B
3 4384 2 B
4 3359 3 B
5 2820 4 B
6 2466 5 B
7 2144 6 C
8 1941 7 C
9 1778 8 C
10 1550 9 C
This seems straight toward at first, but googling around doesn't help much. 这开始似乎是直接的,但谷歌搜索并没有多大帮助。 Advices much appreciated. 建议非常感谢。
You could use cut
or findInterval
你可以使用cut
或findInterval
df$Grp <- with(df, LETTERS[1:3][cut(V2, breaks=c(-Inf,0, 5, Inf),
labels=FALSE)])
df$Grp <- with(df, LETTERS[1:3][findInterval(V2, c(-Inf,0, 5,Inf)+1)])
df
# V1 V2 Grp
#1 13219 0 A
#2 6358 1 B
#3 4384 2 B
#4 3359 3 B
#5 2820 4 B
#6 2466 5 B
#7 2144 6 C
#8 1941 7 C
#9 1778 8 C
#10 1550 9 C
Or 要么
with(df, LETTERS[c(2,1,3)][1+(V2==0) + 2*(V2 >=6)])
#[1] "A" "B" "B" "B" "B" "B" "C" "C" "C" "C"
Using indexing, this can be done quite easily: 使用索引,这可以很容易地完成:
df$group <- NA
df$group[df$V2 == 0] <- "A"
df$group[df$V2 > 0] <- "B"
df$group[df$V2 >= 6] <- "C"
Note that the 3rd and 4th statements must be run in that sequence. 请注意,必须按顺序运行第3和第4个语句。 Otherwise -- if you didn't want to have to run the "C" assignations after the "B" assignations, you'd need to define the indexing for the "B" assignations more thoroughly: 否则 - 如果您不想在“B”分配后运行“C”分配,则需要更彻底地定义“B”分配的索引:
df$group[df$V2 > 0 & df$V2 < 6] <- "B"
Results 结果
V1 V2 group
1 13219 0 A
2 6358 1 B
3 4384 2 B
4 3359 3 B
5 2820 4 B
6 2466 5 B
7 2144 6 C
8 1941 7 C
9 1778 8 C
10 1550 9 C
Data 数据
df <- read.csv(text="V1,V2
13219,0
6358,1
4384,2
3359,3
2820,4
2466,5
2144,6
1941,7
1778,8
1550,9")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.