bash: using 2 variables from same file and sed

Question

I have a 2 files:

file1.txt
rs142159069:45000079:TACTTCTTGGACATTTCC:T 45000079
rs111285978:45000103:A:AT 45000103
rs190363568:45000168:C:T 45000168

file2.txt
rs142159069:45000079:TACTTCTTGGACATTTCC:T rs142159069
rs111285978:45000103:A:AT rs111285978
rs190363568:45000168:C:T rs190363568

Using file2.txt, I want to replace the names (column2 of file1.txt which is column1 of file2.txt) by the entry in column 2. The output file would then be:

rs142159069 45000079
rs111285978 45000103
rs190363568 45000168

I have tried inputing the columns of file2.txt but without success:

while read -r a b
do
cat file1.txt | sed s'/$a/$b/'
done < file2.txt

I am quite new to bash. Also, not sure how to write an output file with my command. Any help would be deeply appreciated.

Answer 1

In your case, using awk or perl would be easier, if you are willing to accept an answer without sed :

awk '(NR==FNR){out[$1]=$2;next}{out[$1]=out[$1]" "$2}END{for (i in out){print out[i]} }' file2.txt file1.txt > output.txt

output.txt :

rs142159069 45000079
rs111285978 45000103
rs190363568 45000168

Note: this assume all symbols in column1 are unique, and that they are all present in both files

explanation:

(NR==FNR){out[$1]=$2;next} : while you are parsing the first file, create a map with the name from the first column as key
{out[$1]=out[$1]" "$2} : append the value from the second column
END{for (i in out){print out[i]} } : print all the values in the map

Answer 2

Apparently $2 of file2 is part of $1 of file1 , so you could use awk and redefine FS :

$ awk -F"[: ]" '{print $1,$NF}' file1
rs142159069 45000079
rs111285978 45000103
rs190363568 45000168

bash: using 2 variables from same file and sed

Question

2 answers

solution1
0 2017-09-25 02:11:35

solution2
0 2017-09-25 02:37:05

bash: using 2 variables from same file and sed

Question

2 answers

solution1 0 2017-09-25 02:11:35

solution2 0 2017-09-25 02:37:05

solution1
0 2017-09-25 02:11:35

solution2
0 2017-09-25 02:37:05