简体   繁体   中英

bash loop file in one folder against all files in another folder

I am trying to do a ( bedtools ) intersect. Anyone see what is wrong with my bash code here?

#!/bin/bash

dir=$(pwd)
query=$dir/query_beds
pgs=$dir/pgs_beds

for file in $query/*; do 
   bedtools intersect -wa -wb -a $file -b $pgs/* -c -C -sorted -filenames
done > ${file%.*}-results.txt

"query" directory contains many files which need to each be iteratively queried against many files in "pgs" directory using the package bedtools and command intersect, which allows an asterisk for file "-b" to cycle multiple files.

Per usage:

One or more BAM/BED/GFF/VCF file(s) “B”. Use “stdin” if passing B with a UNIX pipe.
NEW!!!: -b may be followed with multiple databases and/or wildcard (*) character(s).

I believe the problem is with $pgs/*, looping one file against multiple files in another directory and I may need to restructure my loop to change directory or something.

UPDATE:

I have updated the script to work and was able to get everything working with:

#!/bin/bash -x
dir=$(pwd)
query=$dir/query_beds
pgs=$dir/pgs_beds
for f in $(ls $query/*.bed); do 
   for g in $(ls $pgs/*); do
      bedtools intersect -wa -wb -a $f -b $g* -C -filenames
   done > $(basename $f .ext).txt
done

In your code the output of the whole loop is redirected to one file. Maybe you want something like below (redirect the output of each bedtool to a separate file).

Also note that if you use ${file%.*} the outputs will be created in the $query directory so executing the script twice might result in unexpected results. I would recommend to use $(basename $file.ext).txt (where.ext is the extension of the files in $query . The basename command keeps only the name of the file (so without the directory) and also removes the.ext if present.

#!/bin/bash

dir=$(pwd)
query=$dir/query_beds
pgs=$dir/pgs_beds

for file in $query/*; do 
   bedtools intersect -wa -wb -a $file -b $pgs/* -c -C -sorted -filenames > ${file%.*}-results.txt
done 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM