简体   繁体   中英

R Program: Have dataframe with non-unique IDs. Need to create column with unique IDs

I have a dataframe that has duplicate IDs with unique values. I need to make the IDs unique.

df <- read.csv("test.csv")

ID: A1, A1, A2, A2, A3, A3, A4, A4  
Value: 0.5, 0.9, 1.5, 0.8, 2.2, 2.4, 3.1, 0.5

I need to get this dataframe:

ID: A1_1, A1_2, A2_1, A2_2, A3_1, A3_2, A4_1, A4_2  
Value: 0.5, 0.9, 1.5, 0.8, 2.2, 2.4, 3.1, 0.5

I tried the following code which adds a column with repeating alternating _1 and _2 and concatenates to the ID:

unique <- c("_1", "_2")  
Unique.col <- matrix(rep(unique, ))  
unique_ID <- cbind(df, Unique.col)  
unique_ID$ID <- paste(unique_ID$ID, unique_ID$Unique.col)  
unique_ID

I get the following dataframe where there is a space between the A1 and _1:

ID: A1 _1, A1 _2, A2 _1, A2 _2, A3 _1, A3 _2, A4 _1, A4 _2  
Value: 0.5, 0.9, 1.5, 0.8, 2.2, 2.4, 3.1, 0.5

Is there a better way to do this or a way to get rid of the space?

You can use gsub(" ","",unique_ID) to remove spaces

Example:

unique_ID <- c("A1 _1", "A1 _2", "A2 _1", "A2 _2", "A3 _1", "A3 _2", "A4 _1", 
"A4 _2")
test <- gsub(" ","",unique_ID)

> test
[1] "A1_1" "A1_2" "A2_1" "A2_2" "A3_1" "A3_2" "A4_1"
[8] "A4_2"

A general dplyr / tidyr way to solve this is to leverage pivot_longer and pivot_wider together: lengthen, then group by the original column names and create unique within-group IDs, then widen back. This seems a little bit deviant from the usual spirit of pivoting, but it gets the job done!

Sample data:

df <- tribble(
    ~"A1", ~"A1", ~"A2", ~"A2", ~"A3", ~"A3", ~"A4", ~"A4",
    1, 2, 3, 4, 5, 6, 7, 8
)

For creating the unique within-group IDs, see this answer . For combining those IDs with the original column names, the tidyr pivoting vignette has some great examples (eg here ). The key is to use the names_from argument to combine the original column name with the new IDs. This gets us to:

df %>%
    # Pivot original column names to "name" column and original values to "value" column
    pivot_longer(cols=everything()) %>%
    # Create unique IDs within each original column
    group_by(name) %>%
    mutate(row_id=row_number()) %>%
    # Pivot back to the desired wider format
    pivot_wider(names_from=c(name, row_id))

Output:

# A tibble: 1 x 8
   A1_1  A1_2  A2_1  A2_2  A3_1  A3_2  A4_1  A4_2
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     2     3     4     5     6     7     8

We can also change the format of the new column names in the call to pivot_wider , either with names_sep (default is _ ) or with names_pattern (which takes a regular expression).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM