简体   繁体   English

R - 一行中数据框列中的相同值

[英]R - Identical values in columns of dataframe in one row

I have a data frame containing 3 columns of non-integer values.我有一个包含 3 列非整数值的数据框。 The values in the respective columns allot of the time will be identical to values in the other one or two columns in the same data frame.分配时间的相应列中的值将与同一数据框中其他一或两列中的值相同。 If there are matches between columns I would like to have them on the same row.如果列之间存在匹配,我希望它们位于同一行。

See subset_df vs expected_subset_df below for clarification.请参阅下面的subset_dfexpected_subset_df以进行说明。

Notice that the values ending on "248:-" are in the same row in expected_subset_df but not in subset_df .请注意,以"248:-"结尾的值在expected_subset_df中的同一行中,但不在subset_df 中

Summary: values in col1 can also be in col2 and/or col3.总结: col1 中的值也可以在 col2 和/或 col3 中。 If the values between columns do match I want them on the same row.如果列之间的值匹配,我希望它们在同一行。

> subset_df
         col1          col2          col3
1 20:31722330:- 20:31722330:- 20:31722330:-
2 20:31722348:- 20:31724051:- 20:31724051:-
3         FALSE 20:31722348:- 20:31722348:-
> expected_subset_df
         col1          col2          col3
1 20:31722330:- 20:31722330:- 20:31722330:-
2 20:31722348:- 20:31722348:- 20:31722348:-
3         FALSE 20:31724051:- 20:31724051:-

What I have attempted我尝试过的

library(dplyr)
subset_df %>% 
    mutate_all(as.character) %>% 
        mutate(col1 = subset_df$col1[match(subset_df$col2, subset_df$col1)],
        col3 = subset_df$col3[match(subset_df$col2, subset_df$col3)])

Yields:产量:

         col1          col2          col3
1 20:31722330:- 20:31722330:- 20:31722330:-
2          <NA> 20:31724051:- 20:31724051:-
3 20:31722348:- 20:31722348:- 20:31722348:-

Is this method robust?这种方法健壮吗? Is there a better alternative?有更好的选择吗?

Edit:编辑:

Suppose dataframe breakpoint looks like this:假设数据帧断点如下所示:

> breakpoint
         col1           col2            col3
1 20:31722330:- 20:31722344:-            FALSE
2 21:15014555:- 21:15014555:-            FALSE
3 21:15014767:- 21:15014767:-    21:15014767:-

How can I turn dataframe breakpoint into this:我怎样才能把数据帧断点变成这个:

> expected_breakpoint
         col1           col2          col3
1 20:31722330:-          <NA>          <NA>
2          <NA>  20:31722344:-         <NA>
3 21:15014555:-  21:15014555:-         <NA>
4          <NA>          <NA>         FALSE
5          <NA>          <NA>         FALSE
6 21:15014767:-  21:15014767:-  21:15014767:-

Edit 2: FALSE into <NA> before analysis编辑 2:在分析前FALSE<NA>

Suppose dataframe breakpoint_new looks like this:假设数据帧breakpoint_new如下所示:

> breakpoint_new
         col1           col2            col3
1 20:31722330:- 20:31722344:-            <NA>
2 21:15014555:- 21:15014555:-            <NA>
3 21:15014767:- 21:15014767:-    21:15014767:-

How can I turn dataframe breakpoint_new into this:我怎样才能把数据帧breakpoint_new变成这样:

> expected_breakpoint_new
         col1           col2          col3
1 20:31722330:-          <NA>          <NA>
2          <NA>  20:31722344:-         <NA>
3 21:15014555:-  21:15014555:-         <NA>
4 21:15014767:-  21:15014767:-  21:15014767:-

The following function solves my problem:以下功能解决了我的问题:

match_columns = function(df, nomatch=F){
  if (ncol(df) != 3){
    stop("Input DataFrame needs to have 3 columns")
  }
  matrix = matrix(ncol = 3, nrow = 0)
  match12 = intersect(df$object, df$object.1)
  match23 = intersect(df$object.1, df$object.2)
  match13 = intersect(df$object, df$object.2)


  for (item in match12){
    if (item == nomatch){next}
    if (item %in% match23){
      matrix = rbind(matrix, c(rep(item, 3)))
    }else{
      matrix = rbind(matrix, c(rep(item, 2), nomatch))
    }
  }

  for (item in match13){
    if (item == nomatch){next}
    if (!(item %in% match12)){
      matrix = rbind(matrix, c(item, nomatch, item))
    }
  }

  for (item in match23){
    if (item == nomatch){next}
    if (!(item %in% match13)){
      matrix = rbind(matrix, c(nomatch, rep(item, 2)))
    }
  }

  for (item in df$object){
    if (item == nomatch){next}
    if (!(item %in% match12) & !(item %in% match13)){
      matrix = rbind(matrix, c(item, rep(nomatch, 2)))
    }
  }

  for (item in df$object.1){
    if (item == nomatch){next}
    if (!(item %in% match12) & !(item %in% match23)){
      matrix = rbind(matrix, c(nomatch, item, nomatch))
    }
  }

  for (item in df$object.2){
    if (item == nomatch){next}
    if (!(item %in% match13) & !(item %in% match23)){
      matrix = rbind(matrix, c(rep(nomatch, 2), item))
    }
  }

  return(matrix)
}

Values in their respective columns are matched with identical values in other columns.它们各自列中的值与其他列中的相同值匹配。 FALSE 's are introduced if not all three columns match.如果不是所有三列都匹配,则引入FALSE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM