简体   繁体   中英

bash: using 2 variables from same file and sed

I have a 2 files:

file1.txt
rs142159069:45000079:TACTTCTTGGACATTTCC:T 45000079
rs111285978:45000103:A:AT 45000103
rs190363568:45000168:C:T 45000168

file2.txt
rs142159069:45000079:TACTTCTTGGACATTTCC:T rs142159069
rs111285978:45000103:A:AT rs111285978
rs190363568:45000168:C:T rs190363568

Using file2.txt, I want to replace the names (column2 of file1.txt which is column1 of file2.txt) by the entry in column 2. The output file would then be:

rs142159069 45000079
rs111285978 45000103
rs190363568 45000168

I have tried inputing the columns of file2.txt but without success:

while read -r a b
do
cat file1.txt | sed s'/$a/$b/'
done < file2.txt

I am quite new to bash. Also, not sure how to write an output file with my command. Any help would be deeply appreciated.

In your case, using awk or perl would be easier, if you are willing to accept an answer without sed :

awk '(NR==FNR){out[$1]=$2;next}{out[$1]=out[$1]" "$2}END{for (i in out){print out[i]} }' file2.txt file1.txt > output.txt

output.txt :

rs142159069 45000079
rs111285978 45000103
rs190363568 45000168

Note: this assume all symbols in column1 are unique, and that they are all present in both files

explanation:

  • (NR==FNR){out[$1]=$2;next} : while you are parsing the first file, create a map with the name from the first column as key
  • {out[$1]=out[$1]" "$2} : append the value from the second column
  • END{for (i in out){print out[i]} } : print all the values in the map

Apparently $2 of file2 is part of $1 of file1 , so you could use awk and redefine FS :

$ awk -F"[: ]" '{print $1,$NF}' file1
rs142159069 45000079
rs111285978 45000103
rs190363568 45000168

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM