简体   繁体   中英

Renaming duplicated rows

I've got a time-series dataframe that looks like this:

...
year    site
1987    ak12
1976    ak12
1766    ak13
1818    ak13
1987    ak12
2001    ak12
...

As you can see some site names are duplicated (in this case ak12). I want to rename one time-series of ak12 to some unique name (eg 'ak12_a'), without sorting the rows. Like this:

...
year    site
1987    ak12
1976    ak12
1766    ak13
1818    ak13
1987    ak12_a
2001    ak12_a
...

I know about the make_unique function but I don't know how to apply it in this case, since the rows are duplicated anyway because they follow the year column. So I need some code that whenever it 'meets' the second duplicate, it will rename all of its rows. How can I do this?

I would suggest making a nested for loop that runs every item for checking of duplicates.

count = 1 #for the duplicate count
for(a in 'dataframe'){
    for(b in 2:'dataframe'){
        if (equal(a,b)) { #please check on this one, not sure about the command
            b = paste(a,"_",count)
}
}
}

I typed this on the go without doing a test run, but hopefully it would work for you. Please point out if there's something wrong with it.

Does this work:

library(dplyr)
library(stringr)
df %>% group_by(year) %>% mutate(site = case_when(duplicated(site) ~ str_c(site, '_a', sep = ''), TRUE ~ site))
# A tibble: 6 x 2
# Groups:   year [5]
   year site  
  <dbl> <chr> 
1  1987 ak12  
2  1976 ak12  
3  1766 ak13  
4  1818 ak13  
5  1987 ak12_a
6  2001 ak12  

Data used:

df
# A tibble: 6 x 2
   year site 
  <dbl> <chr>
1  1987 ak12 
2  1976 ak12 
3  1766 ak13 
4  1818 ak13 
5  1987 ak12 
6  2001 ak12 

Is this what you are looking for?

df <- within(df, site <- ave(site, year, FUN = make.unique))

Output

> df
  year   site
1 1987   ak12
2 1976   ak12
3 1766   ak13
4 1818   ak13
5 1987 ak12.1
6 2001   ak12

Data I used

structure(list(year = c(1987L, 1976L, 1766L, 1818L, 1987L, 2001L
), site = c("ak12", "ak12", "ak13", "ak13", "ak12", "ak12")), class = "data.frame", row.names = c(NA, 
-6L))

An option with data.table

library(data.table)
 setDT(df)[, site := make.unique(site), year]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM