If i have a data frame which looks like this:
x y
13 a
14 b
15 c
15 c
14 b
and I wanted each group of equal rows to have a unique id, like this:
x y id
13 a 1
14 b 2
15 c 3
15 c 3
14 b 2
Is there any easy way of doing this?
Thanks
I have a bit of a concern with the paste0
approach. If your columns contained more complex data, you could end up with surprising results, eg imagine:
x y
ab c
a bc
One solution is to replace paste0(...)
with paste(..., sep = "@")
. Even so, you cannot come up with a sep
general enough that it will work with any type of data as there is always a non-zero probability that sep
will be contained in some kind of data.
A more robust approach is to use a split/transform/combine approach. You can certainly do it with the base
package but plyr
makes it a bit easier:
library(plyr)
.idx <- 0L
ddply(df, colnames(df), transform, id = (.idx <<- .idx + 1L))
If this is too slow, I would recommend a data.table
approach, as proposed here: data.table "key indices" or "group counter"
This is the first thing I thought:
Make a new variable which just combines the two columns by pasting their values to strings:
a<-paste0(z$x,z$y) #z is your data.frame
The make this as a factor and combine it to your dataframe:
cbind(z,id=factor(a,labels=1:length(unique(a))))
EDIT: @flodel was concerned about using paste0
, it's better to use ordinary paste
, or interaction:
a<-interaction(z,drop=TRUE)
cbind(z,id=factor(a,labels=1:length(unique(a))))
This is assuming that you want to separate x=ab
, y=c
, and x=a
, y=bc
. If not, then use paste0
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.