简体   繁体   中英

How to use while read line to print filename into each line of a text file

In a folder I have the following text files:

$ ls
listofdirectories
CTCF_BEDfiles EZH2_BEDfiles H2AFZ_BEDfiles ... +30 or so more *BEDfiles

What I am trying to do is pipe each line of listofdirectories into an awk print function to change all of the lines in *BEDfiles from just the BEDfile name to the directory I wish to store it in. (*BEDfiles are all compressed text files)

$ cat listofdirectories
CTCF_assay
EZH2_assay
H2AFZ_assay
... etc.

$ zcat CTCF_BEDfiles
ENCFF509KKI.bed.gz
ENCFF509KKI.bed.gz
ENCFF490CTJ.bed.gz
... etc.

I have a directory for each line of listofdirectories eg, ~/folder/CTCF_assay , and wish to convert each line of each text file *BEDfiles into a pathway to store the BEDfile in its appropriate folder. All new directory lines can be stored in a single text file pathsforBEDfiles

Desired Outcome:
$ cat pathsforBEDfiles
~/folder/CTCF_assay/ENCFF509KKI.bed.gz
~/folder/CTCF_assay/ENCFF509KKI.bed.gz
~/folder/CTCF_assay/ENCFF490CTJ.bed.gz
... etc.

I have tried the following:

$ cat listofdirectories | while read line ; do zcat "${line%assay}BEDfiles" | awk '{print "~/folder/"$line"/"$0"/"}'; done > pathsforBEDfiles

This nearly worked for me, but the "$line" in the awk command is printing out lines from *BEDfiles rather than lines from listofdirectories .

What the above code returns (viewed with less -S)

~/folder/ENCFF509KKI.bed.gz/ENCFF509KKI.bed.gz/
~/folder/ENCFF509KKI.bed.gz/ENCFF509KKI.bed.gz/
~/folder/ENCFF490CTJ.bed.gz/ENCFF490CTJ.bed.gz/
... etc.

Any idea how I can get the $line in the awk command to print *_assay from listofdirectories instead of ENCFF* from *BEDfiles ?

Thanks, Steven

As written the current script is unable to access the bash variable $line .

In awk this - $line - is a field reference where the number of the field is whatever value is stored in the awk variable line ; but since line is never defined it defaults to a value of 0 , leaving us with the reference to $0 , which is how we reference the entire line of input; this in turn means OP's current awk/print is doing the following:

print "~/folder/" $0 "/" $0 "/"

Hence the reason we're seeing the contents from the zcat'd file echoed twice in the output.


One idea for updating OP's current code:

while read -r line
do
    zcat "${line%assay}BEDfiles" | awk -v line="${line}" '{print "~/folder/" line "/" $0 "/"}'
done < listofdirectories > pathsforBEDfiles

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM