简体   繁体   中英

R: Uniting two columns into a single column with unique values

I would appreciate your help with uniting two columns into a single column, while keeping the new values unique. I tried to find a solution to this issue but since I'm terrible at doing loops in R, maybe it's better if some shows the right way to dothis.

Let's say I have a dataset like this:

place   year
A   2018
A   2018
B   2018
C   2018
C   2018
C   2019
C   2019

I would like to create a new column (variable) that combines both columns (place and year) but adds a numeric suffix in in the case of repetitions. For example, C has two cases of 2018 and 2019. I would like the new value to of the new variable to be "C_2018.1" and "C_2018.2" if that makes sense. I know how to combine variables into strings, but adding the number of non-unique values is what I'm not sure about. Maybe I need loops?

data$new_v <- paste(data$place, data$year, sep = "_")

I hope this makes sufficient sense and it should be quite easy I guess.

Loops might be easier but...

data$ctr = unlist(sapply(table(data$new_v), function(n)1:n))

And then you could do

data$new_v <- paste(data$new_v, data$ctr, sep = ".")

This would leave you with the singletons (like B) still having a.1

You can solve this with dplyr:

data %>%
  group_by(place, year) %>%
  mutate(new_v = paste0(place, "_", year, ".", row_number()))

The group_by clause causes row_number() to count within the groups, starting from 1.

df <- data.frame(place=c("A","A","B","C","C","C","C"),year=c(2018,2018,2018,2018,2018,2019,2019))
df <- data.table(df)
df[,counter:=seq(.N),by=c("place","year")]
df[,new_var:=paste(place,year,counter,sep="_")]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM