简体   繁体   中英

How to combine items from two columns in two separate files?

I have two tables which I need to compare

Table 1:XLOC IDs

Column A:Xloc id

Column B: gene id

Table 2: Ensembl IDs

Column A: Ensembl id

Column B: gene Id

In both tables, there are identical Gene ids (names eg cpa6). In table 1 there are 25000 entries, in table 2 there are 46000 entries.

I need to insert the Ensemble Ids from ColA, Table 2 into ColC of Table1, when both gene ids in column B match and create an output file with new data- eg

Table 1:

ENS0002 cpa6

Table 2:

Xloc0014 cpa6

Output file, table 3:

ENS0002 cpa6 Xloc0014

. The columns are not in the same order and cannot be sorted alphabetically etc. The remaining 21000 entries without corresponding Xlocs I will get rid of (but can easily do this post-output).

Does anyone know how to do this in either R, Excel, or other software?, relatively easily?

NB Both tables can not be sorted into the same order, so I really need to use a formula/script/bash to do this.

Thanks

Try this. I have created an example data frame to show how you can merge and keep only the values that exist in both tables.

As you can see the new table is a result of these values that exist in both and now you have 3 columns with the value of the second table.

In case you want to keep all the rows that exist in both you must use the column gene Id in order to keep these gene Id that exist in both. newTable <- merge(tab1,tab2,by = "gen_id") for example.

tab1 <- data.frame(col1=c("id1","id2","id3","id4"),col2=c(1,2,3,4))
tab2 <- data.frame(col1=c("id1","id2","id3","id5","id7"),col2=c(1,3,3,5,6))
newTable <- merge(tab1,tab2,by = "col1")

in case you want to keep all from table1 but maybe they dont exist in table2 use this.

newTable <- merge(tab1,tab2,by = "col1",all.x=T)

these will keep all the rows of table1 and will give a value at col2.y otherwise you will have NAs.

In RI would use the merge function merge(Table 1, Table 2,by="cpa6") .

However, I have done this in Excel before, which worked well too using the VLOOKUP function. You just need to use a IF function in R, with a nested VLOOKUP inside:

=IF(ISERROR(VLOOKUP(cell with gene name in Table1,array of cells that contain the gen names in Table2, number of the column in the array in Table2,"TRUE" so they match exactly)), Output if true, output if false).

Example:

=IF(ISERROR(VLOOKUP(C4,List1!A1:List1!A$2:A$1000,1,TRUE)), "Does NOT exist in List 1","Exists in List 1")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM