In a folder I have the following text files:
$ ls
listofdirectories
CTCF_BEDfiles EZH2_BEDfiles H2AFZ_BEDfiles ... +30 or so more *BEDfiles
What I am trying to do is pipe each line of listofdirectories
into an awk print function to change all of the lines in *BEDfiles
from just the BEDfile name to the directory I wish to store it in. (*BEDfiles are all compressed text files)
$ cat listofdirectories
CTCF_assay
EZH2_assay
H2AFZ_assay
... etc.
$ zcat CTCF_BEDfiles
ENCFF509KKI.bed.gz
ENCFF509KKI.bed.gz
ENCFF490CTJ.bed.gz
... etc.
I have a directory for each line of listofdirectories
eg, ~/folder/CTCF_assay
, and wish to convert each line of each text file *BEDfiles
into a pathway to store the BEDfile in its appropriate folder. All new directory lines can be stored in a single text file pathsforBEDfiles
Desired Outcome:
$ cat pathsforBEDfiles
~/folder/CTCF_assay/ENCFF509KKI.bed.gz
~/folder/CTCF_assay/ENCFF509KKI.bed.gz
~/folder/CTCF_assay/ENCFF490CTJ.bed.gz
... etc.
I have tried the following:
$ cat listofdirectories | while read line ; do zcat "${line%assay}BEDfiles" | awk '{print "~/folder/"$line"/"$0"/"}'; done > pathsforBEDfiles
This nearly worked for me, but the "$line" in the awk command is printing out lines from *BEDfiles
rather than lines from listofdirectories
.
What the above code returns (viewed with less -S)
~/folder/ENCFF509KKI.bed.gz/ENCFF509KKI.bed.gz/
~/folder/ENCFF509KKI.bed.gz/ENCFF509KKI.bed.gz/
~/folder/ENCFF490CTJ.bed.gz/ENCFF490CTJ.bed.gz/
... etc.
Any idea how I can get the $line
in the awk command to print *_assay
from listofdirectories
instead of ENCFF*
from *BEDfiles
?
Thanks, Steven
As written the current script is unable to access the bash
variable $line
.
In awk
this - $line
- is a field reference where the number of the field is whatever value is stored in the awk
variable line
; but since line
is never defined it defaults to a value of 0
, leaving us with the reference to $0
, which is how we reference the entire line of input; this in turn means OP's current awk/print
is doing the following:
print "~/folder/" $0 "/" $0 "/"
Hence the reason we're seeing the contents from the zcat'd
file echoed twice in the output.
One idea for updating OP's current code:
while read -r line
do
zcat "${line%assay}BEDfiles" | awk -v line="${line}" '{print "~/folder/" line "/" $0 "/"}'
done < listofdirectories > pathsforBEDfiles
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.