简体   繁体   English

根据R中的另一列分配组编号

[英]Assign a group number based on another column by group in R

This is probably very straight forward, but I can't figure out a way to do this. 这可能非常简单,但我无法想出办法。 I have some data that looks like this: 我有一些看起来像这样的数据:

domain  difference
xxxx    0
xxxx    2
xxxx    14
xxxx    3
xxxx    7
xxxx    2
yyyy    6
yyyy    5
yyyy    13
yyyy    10
zzzz    2
zzzz    5
zzzz    1
zzzz    15
zzzz    16
zzzz    8
zzzz    9

I want it to look like this: 我希望它看起来像这样:

domain  difference  grp
xxxx    0           1
xxxx    2           1
xxxx    14          2
xxxx    3           2
xxxx    7           2
xxxx    2           2
yyyy    6           1
yyyy    5           1
yyyy    13          1
yyyy    10          1
zzzz    2           1
zzzz    5           1
zzzz    1           1
zzzz    15          2
zzzz    16          3
zzzz    8           3
zzzz    9           3

So basically by domain I want to assign a group number to several rows if the difference is greater than or equal to 14. When there is a difference greater than or equal to 14, assign a group number to the previous rows. 因此,基本上通过域我想要将组编号分配给多行,如果差异大于或等于14.当存在大于或等于14的差异时,将组编号分配给前面的行。

I've tried using a nested for loop, where the domains are levels but I feel like that may be unnecessarily complex, and I'm not sure how to tell the loop to keep going and pick up where it left off after assigning the first group number. 我已经尝试过使用嵌套for循环,其中域是级别但我觉得这可能是不必要的复杂,我不知道如何告诉循环继续前进并在分配第一个后继续它停止的地方组号。 Here's what I have so far: 这是我到目前为止所拥有的:

lev <- levels(e_won$domain)
lev <- levels(e_won$domain)
for (i in 1:length(lev)) { 
  for (j in 1:nrow(lev)){
    if (difference[j] >= 14) {
      grp[1:j] = 1
    }

I'm completely open to a non-loop solution, but that's just what I thought at first. 我对非循环解决方案完全开放,但这正是我最初的想法。

You can try 你可以试试

library(data.table)
setDT(df1)[, grp:=cumsum(difference>=14)+1L, by=domain][]
#    domain difference grp
#1:   xxxx          0   1
#2:   xxxx          2   1
#3:   xxxx         14   2
#4:   xxxx          3   2
#5:   xxxx          7   2
#6:   xxxx          2   2
#7:   yyyy          6   1
#8:   yyyy          5   1
#9:   yyyy         13   1
#10:  yyyy         10   1
#11:  zzzz          2   1
#12:  zzzz          5   1
#13:  zzzz          1   1
#14:  zzzz         15   2
#15:  zzzz         16   3
#16:  zzzz          8   3
#17:  zzzz          9   3

Or using dplyr 或者使用dplyr

 df1 %>%
    group_by(domain) %>% 
    mutate(grp= cumsum(difference >=14)+1L)

Or using base R (from @Colonel Beauvel's comments) 或使用base R (来自@Colonel Beauvel的评论)

df1$grp <- with(df1, ave(difference>=14, domain, FUN=cumsum)) + 1L

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM