Assigning unique id to duplicated rows

Question

If i have a data frame which looks like this:

x y
13 a
14 b
15 c
15 c
14 b

and I wanted each group of equal rows to have a unique id, like this:

x y id
13 a 1
14 b 2
15 c 3
15 c 3
14 b 2

Is there any easy way of doing this?

Thanks

Answer 1

I have a bit of a concern with the paste0 approach. If your columns contained more complex data, you could end up with surprising results, eg imagine:

 x  y
ab  c
 a bc

One solution is to replace paste0(...) with paste(..., sep = "@") . Even so, you cannot come up with a sep general enough that it will work with any type of data as there is always a non-zero probability that sep will be contained in some kind of data.

A more robust approach is to use a split/transform/combine approach. You can certainly do it with the base package but plyr makes it a bit easier:

library(plyr)
.idx <- 0L
ddply(df, colnames(df), transform, id = (.idx <<- .idx + 1L))

If this is too slow, I would recommend a data.table approach, as proposed here: data.table "key indices" or "group counter"

Answer 2

This is the first thing I thought:

Make a new variable which just combines the two columns by pasting their values to strings:

a<-paste0(z$x,z$y) #z is your data.frame

The make this as a factor and combine it to your dataframe:

cbind(z,id=factor(a,labels=1:length(unique(a))))

EDIT: @flodel was concerned about using paste0 , it's better to use ordinary paste , or interaction:

a<-interaction(z,drop=TRUE)
cbind(z,id=factor(a,labels=1:length(unique(a))))

This is assuming that you want to separate x=ab , y=c , and x=a , y=bc . If not, then use paste0 .

Assigning unique id to duplicated rows

Question

2 answers

solution1
4 2013-03-08 21:27:34

solution2
3 ACCPTED 2013-03-08 20:53:26

Assigning unique id to duplicated rows

Question

2 answers

solution1 4 2013-03-08 21:27:34

solution2 3 ACCPTED 2013-03-08 20:53:26

solution1
4 2013-03-08 21:27:34

solution2
3 ACCPTED 2013-03-08 20:53:26