简体   繁体   中英

Merging dataframes in R with duplicated values in rownames

im trying to use the rbind function exactly like in this post.

i have 3 dataframes with 2 columns (rownames and "source").

Between the first and second dataframes there are 2 rownames duplicated values and between the first and the third dataframes there's one.

a

TMCS09g1008676 fleshy TMCS09g1008677 fleshy TMCS09g1008678 fleshy TMCS09g1008679 fleshy TMCS09g1008680 fleshy TMCS09g1008681 fleshy TMCS09g1008682 fleshy TMCS09g1008683 fleshy

b

TMCS09g1008684 rotten TMCS09g1008685 rotten TMCS09g1008686 rotten TMCS09g1008682 rotten TMCS09g1008688 rotten TMCS09g1008689 rotten TMCS09g1008690 rotten TMCS09g1008691 rotten TMCS09g1008683 rotten TMCS09g1008693 rotten

c

TMCS09g1008695 good TMCS09g1008696 good TMCS09g1008697 good TMCS09g1008698 good TMCS09g1008683 good TMCS09g1008700 good TMCS09g1008701 good TMCS09g1008702 good TMCS09g1008703 good TMCS09g1008704 good TMCS09g1008705 good

after applying the fuction suggested in that post:

duprows <- which(!is.na(match(rownames(a),rownames(b)))) rbind(a, b[-duprows,])

i get this:
> rbind(a, b[-duprows,]) source TMCS09g1008677 fleshy TMCS09g1008678 fleshy TMCS09g1008679 fleshy TMCS09g1008680 fleshy TMCS09g1008681 fleshy TMCS09g1008682 fleshy TMCS09g1008683 fleshy 8 <NA> Warning message: In [<-.factor ( tmp , ri, value = 1L) : invalid factor level, NA generated

The main thing is that I would like to retain the 2nd column of the dataframe "a" for the duplicated values. How i was clear.
Thanks in advance

What do you think about this.

Sample data (your data - reproducible ;)):

require(tidyverse)

a <- data.frame(ID = c("TMCS09g1008676",
          "TMCS09g1008677",
          "TMCS09g1008678",
          "TMCS09g1008679",
          "TMCS09g1008680" ,
          "TMCS09g1008681",
          "TMCS09g1008682",
          "TMCS09g1008683"), Staus = rep("fleshy"))

b <- data.frame(ID = c( "TMCS09g1008684" ,
                        "TMCS09g1008685" ,
                        "TMCS09g1008686" ,
                        "TMCS09g1008682"  ,
                        "TMCS09g1008688" ,
                        "TMCS09g1008689" ,
                        "TMCS09g1008690" ,
                        "TMCS09g1008691",
                        "TMCS09g1008683" ,
                        "TMCS09g1008693"), Staus = rep("rotten"))

c <- data.frame(ID = c( "TMCS09g1008695" ,
                        "TMCS09g1008696"  ,
                        "TMCS09g1008697" ,
                        "TMCS09g1008698"  ,
                        "TMCS09g1008683"  ,
                        "TMCS09g1008700" ,
                        "TMCS09g1008701"  ,
                        "TMCS09g1008702" ,
                        "TMCS09g1008703"  ,
                        "TMCS09g1008704" ,
                        "TMCS09g1008705" ), Staus = rep("good"))

You can use plyr::join_all to do the matching. Because you have duplicated IDs it will be forced to open new columns to fit the duplicated values.

plyr::join_all(list(a,b,c), by = "ID")

Results:

              ID  Staus  Staus Staus
1 TMCS09g1008676 fleshy   <NA>  <NA>
2 TMCS09g1008677 fleshy   <NA>  <NA>
3 TMCS09g1008678 fleshy   <NA>  <NA>
4 TMCS09g1008679 fleshy   <NA>  <NA>
5 TMCS09g1008680 fleshy   <NA>  <NA>
6 TMCS09g1008681 fleshy   <NA>  <NA>
7 TMCS09g1008682 fleshy rotten  <NA>
8 TMCS09g1008683 fleshy rotten  good

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM