简体   繁体   中英

R - Identical values in columns of dataframe in one row

I have a data frame containing 3 columns of non-integer values. The values in the respective columns allot of the time will be identical to values in the other one or two columns in the same data frame. If there are matches between columns I would like to have them on the same row.

See subset_df vs expected_subset_df below for clarification.

Notice that the values ending on "248:-" are in the same row in expected_subset_df but not in subset_df .

Summary: values in col1 can also be in col2 and/or col3. If the values between columns do match I want them on the same row.

> subset_df
         col1          col2          col3
1 20:31722330:- 20:31722330:- 20:31722330:-
2 20:31722348:- 20:31724051:- 20:31724051:-
3         FALSE 20:31722348:- 20:31722348:-
> expected_subset_df
         col1          col2          col3
1 20:31722330:- 20:31722330:- 20:31722330:-
2 20:31722348:- 20:31722348:- 20:31722348:-
3         FALSE 20:31724051:- 20:31724051:-

What I have attempted

library(dplyr)
subset_df %>% 
    mutate_all(as.character) %>% 
        mutate(col1 = subset_df$col1[match(subset_df$col2, subset_df$col1)],
        col3 = subset_df$col3[match(subset_df$col2, subset_df$col3)])

Yields:

         col1          col2          col3
1 20:31722330:- 20:31722330:- 20:31722330:-
2          <NA> 20:31724051:- 20:31724051:-
3 20:31722348:- 20:31722348:- 20:31722348:-

Is this method robust? Is there a better alternative?

Edit:

Suppose dataframe breakpoint looks like this:

> breakpoint
         col1           col2            col3
1 20:31722330:- 20:31722344:-            FALSE
2 21:15014555:- 21:15014555:-            FALSE
3 21:15014767:- 21:15014767:-    21:15014767:-

How can I turn dataframe breakpoint into this:

> expected_breakpoint
         col1           col2          col3
1 20:31722330:-          <NA>          <NA>
2          <NA>  20:31722344:-         <NA>
3 21:15014555:-  21:15014555:-         <NA>
4          <NA>          <NA>         FALSE
5          <NA>          <NA>         FALSE
6 21:15014767:-  21:15014767:-  21:15014767:-

Edit 2: FALSE into <NA> before analysis

Suppose dataframe breakpoint_new looks like this:

> breakpoint_new
         col1           col2            col3
1 20:31722330:- 20:31722344:-            <NA>
2 21:15014555:- 21:15014555:-            <NA>
3 21:15014767:- 21:15014767:-    21:15014767:-

How can I turn dataframe breakpoint_new into this:

> expected_breakpoint_new
         col1           col2          col3
1 20:31722330:-          <NA>          <NA>
2          <NA>  20:31722344:-         <NA>
3 21:15014555:-  21:15014555:-         <NA>
4 21:15014767:-  21:15014767:-  21:15014767:-

The following function solves my problem:

match_columns = function(df, nomatch=F){
  if (ncol(df) != 3){
    stop("Input DataFrame needs to have 3 columns")
  }
  matrix = matrix(ncol = 3, nrow = 0)
  match12 = intersect(df$object, df$object.1)
  match23 = intersect(df$object.1, df$object.2)
  match13 = intersect(df$object, df$object.2)


  for (item in match12){
    if (item == nomatch){next}
    if (item %in% match23){
      matrix = rbind(matrix, c(rep(item, 3)))
    }else{
      matrix = rbind(matrix, c(rep(item, 2), nomatch))
    }
  }

  for (item in match13){
    if (item == nomatch){next}
    if (!(item %in% match12)){
      matrix = rbind(matrix, c(item, nomatch, item))
    }
  }

  for (item in match23){
    if (item == nomatch){next}
    if (!(item %in% match13)){
      matrix = rbind(matrix, c(nomatch, rep(item, 2)))
    }
  }

  for (item in df$object){
    if (item == nomatch){next}
    if (!(item %in% match12) & !(item %in% match13)){
      matrix = rbind(matrix, c(item, rep(nomatch, 2)))
    }
  }

  for (item in df$object.1){
    if (item == nomatch){next}
    if (!(item %in% match12) & !(item %in% match23)){
      matrix = rbind(matrix, c(nomatch, item, nomatch))
    }
  }

  for (item in df$object.2){
    if (item == nomatch){next}
    if (!(item %in% match13) & !(item %in% match23)){
      matrix = rbind(matrix, c(rep(nomatch, 2), item))
    }
  }

  return(matrix)
}

Values in their respective columns are matched with identical values in other columns. FALSE 's are introduced if not all three columns match.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM