简体   繁体   English

通过匹配不同列的值创建组

[英]Creating Groups by Matching Values of Different Columns

I would like to create groups from a base by matching values. 我想通过匹配值从基础创建组。

I have the following data table: 我有以下数据表:

now<-c(1,2,3,4,24,25,26,5,6,21,22,23)
before<-c(0,1,2,3,23,24,25,4,5,0,21,22)
after<-c(2,3,4,5,25,26,0,6,0,22,23,24)
df<-as.data.frame(cbind(now,before,after))

which reproduces the following data: 它将复制以下数据:

   now before after
1    1      0     2
2    2      1     3
3    3      2     4
4    4      3     5
5   24     23    25
6   25     24    26
7   26     25     0
8    5      4     6
9    6      5     0
10   21      0    22
11   22     21    23
12   23     22    24

I would like to get: 我想得到:

    now before after group
1    1      0     2     A
2    2      1     3     A
3    3      2     4     A
4    4      3     5     A
5    5      4     6     A
6    6      5     0     A
7   21      0    22     B
8   22     21    23     B
9   23     22    24     B
10  24     23    25     B
11  25     24    26     B
12  26     25     0     B

I would like to reach the answer to this without using a "for" loop becouse the real data is too large. 我想在不使用“ for”循环的情况下找到答案,因为实际数据太大。

Any you could provide will be appreciated. 您能提供的任何东西都将不胜感激。

Here is one way. 这是一种方法。 It is hard to avoid a for-loop as this is quite a tricky algorithm. 很难避免for循环,因为这是一个棘手的算法。 The objection to them is often on the grounds of elegance rather than speed, but sometimes they are entirely appropriate. 对它们的反对通常是基于优雅而不是速度,但有时它们是完全适当的。

df$group <- seq_len(nrow(df)) #assign each row to its own group

stop <- FALSE #indicates convergence

while(!stop){
  pre <- df$group #group column at start of loop

  for(i in seq_len(nrow(df))){
    matched <- which(df$before==df$now[i] | df$after==df$now[i]) #check matches in before and after columns
    group <- min(df$group[i], df$group[matched]) #identify smallest group no of matching rows
    df$group[i] <- group #set to smallest group
    df$group[matched] <- group #set to smallest group
  }

  if(identical(df$group, pre)) stop <- TRUE #stop if no change
}

df$group <- LETTERS[match(df$group, sort(unique(df$group)))] #convert groups to letters
#(just use match(...) to keep them as integers - e.g. if you have more than 26 groups)

df <- df[order(df$group, df$now),] #reorder as required

df
   now before after group
1    1      0     2     A
2    2      1     3     A
3    3      2     4     A
4    4      3     5     A
8    5      4     6     A
9    6      5     0     A
10  21      0    22     B
11  22     21    23     B
12  23     22    24     B
5   24     23    25     B
6   25     24    26     B
7   26     25     0     B

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM