[英]using awk to print header name and a substring
i try using this code for printing a header of a gene name and then pulling a substring based on its location but it doesn't work我尝试使用此代码打印基因名称的标题,然后根据其位置提取子字符串,但它不起作用
>output_file
cat input_file | while read row; do
echo $row > temp
geneName=`awk '{print $1}' tmp`
startPos=`awk '{print $2}' tmp`
endPOs=`awk '{print $3}' tmp`
for i in temp; do
echo ">${geneName}" >> genes_fasta ;
echo "awk '{val=substr($0,${startPos},${endPOs});print val}' fasta" >> genes_fasta
done
done
input_file输入文件
nad5_exon1 250405 250551
nad5_exon2 251490 251884
nad5_exon3 195620 195641
nad5_exon4 154254 155469
nad5_exon5 156319 156548
fasta法斯塔
atgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgc............
and this is my wrong output file这是我错误的输出文件
>
awk '{val=substr(pull_genes.sh,,);print val}' unwraped_carm_mt.fasta
>
awk '{val=substr(pull_genes.sh,,);print val}' unwraped_carm_mt.fasta
>
awk '{val=substr(pull_genes.sh,,);print val}' unwraped_carm_mt.fasta
>
awk '{val=substr(pull_genes.sh,,);print val}' unwraped_carm_mt.fasta
>
awk '{val=substr(pull_genes.sh,,);print val}' unwraped_carm_mt.fasta
>
awk '{val=substr(pull_genes.sh,,);print val}' unwraped_carm_mt.fasta
output should look like that:输出应如下所示:
>name1
atgcatgcatgcatgcatgcat
>name2
tgcatgcatgcatgcat
>name3
gcatgcatgcatgcatgcat
>namen....
You can do this with a single call to awk
which will be orders of magnitude more efficient than looping in a shell script and calling awk
4-times per-iteration.您可以通过对
awk
的单次调用来完成此操作,这比在 shell 脚本中循环并每次迭代调用awk
4 次要高效几个数量级。 Since you have bash, you can simply use command substitution and redirect the contents of fasta
to an awk
variable and then simply output the heading and the substring containing the beginning through ending characters from your fasta
file.由于您有 bash,您可以简单地使用命令替换并将
fasta
的内容重定向到awk
变量,然后简单地输出包含从fasta
文件开始到结束字符的标题和子字符串。
For example:例如:
awk -v fasta=$(<fasta) '{print ">" $1; print substr(fasta,$2,$3-$2+1)}' input
or using getline
within the BEGIN
rule:或在
BEGIN
规则中使用getline
:
awk 'BEGIN{getline fasta<"fasta"}
{print ">" $1; print substr(fasta,$2,$3-$2+1)}' input
Example Input Files示例输入文件
Note: the beginning and ending values have been reduced to fit within the 129 characters of your example:注意:开始和结束值已减少以适合您示例的 129 个字符:
$ cat input
rad5_exon1 1 17
rad5_exon2 23 51
rad5_exon3 110 127
rad5_exon4 38 62
rad5_exon5 59 79
and the first 129-characters of your example fasta
以及示例
fasta
的前 129 个字符
$ cat fasta
atgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgcatgc
Example Use/Output示例使用/输出
$ awk -v fasta=$(<fasta) '{print ">" $1; print substr(fasta,$2,$3-$2+1)}' input
>rad5_exon1
atgcatgcatgcatgca
>rad5_exon2
gcatgcatgcatgcatgcatgcatgcatg
>rad5_exon3
tgcatgcatgcatgcatg
>rad5_exon4
tgcatgcatgcatgcatgcatgcat
>rad5_exon5
gcatgcatgcatgcatgcatg
Look thing over and let me know if I understood your question requirements.仔细看看,如果我理解你的问题要求,请告诉我。 Also let me know if you have further questions on the solution.
如果您对解决方案还有其他问题,也请告诉我。
If I'm understanding correctly, how about:如果我理解正确,那么:
awk 'NR==FNR {fasta = fasta $0; next}
{
printf(">%s %s\n", $1, substr(fasta, $2, $3 - $2 + 1))
}' fasta input_file > genes_fasta
fasta
file and stores the sequence in a variable fasta
.fasta
文件并将序列存储在变量fasta
。input_file
line by line, extracts the substring of fasta
starting at $2
and of length $3 - $2 + 1
.input_file
,提取fasta
的子字符串,从$2
开始,长度$3 - $2 + 1
。 (Note that the 3rd argument to substr
function is length, not endpos.) substr
函数的第三个参数是长度,而不是 endpos。) Hope this helps.希望这可以帮助。
made it work!让它工作! this is the script for pulling substrings from a fasta file
这是从 fasta 文件中提取子字符串的脚本
cat genes_and_bounderies1 | while read row; do
echo $row > temp
geneName=`awk '{print $1}' temp`
startPos=`awk '{print $2}' temp`
endPos=`awk '{print $3}' temp`
length=$(expr $endPos - $startPos)
for i in temp; do
echo ">${geneName}" >> genes_fasta
awk -v S=$startPos -v L=$length '{print substr($0,S,L)}' unwraped_${fasta} >> genes_fasta
done
done
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.