简体   繁体   English

我如何在 fasta 文件中输入 grep 模式,然后将所有这些对应序列打印到 fasta 文件中?

[英]How can I grep patterns in a fasta file, then print all those cooresponding sequences to a fasta file?

I've had so much help with code errors when I've used this page.当我使用这个页面时,我在代码错误方面得到了很多帮助。 I really appreciate how much everyone wants to help out.!!我真的很感激每个人都愿意帮忙。!! I am brand new to any kind of coding and whooooooh this is a learning curve.我对任何类型的编码都是全新的,哇哦这是一个学习曲线。

Anyways, I am trying to use the grep command to search for matching sequences from a list of patterns that are in my patterns.txt file.无论如何,我正在尝试使用 grep 命令从我的 patterns.txt 文件中的模式列表中搜索匹配序列。 I am trying to take those sequences, and write a new.fasta file for each one and put them into a new directory.我正在尝试获取这些序列,并为每个序列编写一个 new.fasta 文件并将它们放入一个新目录中。 I am trying to do this all in bash.我正在尝试在 bash 中完成这一切。

This is the bash script I wrote, but I'm not getting any output at all (except the directory gets made which is not that useful)这是我写的 bash 脚本,但我根本没有得到任何 output (除了创建的目录不是那么有用)

Edit**** for clarity: my question is I need to crease a FASTA file for each pattern that contains all the genes and their corresponding sequences that have that pattern.为清楚起见编辑****:我的问题是我需要为每个模式创建一个 FASTA 文件,其中包含所有基因及其具有该模式的相应序列。 I want to name each file after the pattern and output it into a new directory.我想在模式之后命名每个文件,并将它的 output 放入一个新目录中。 Ithink my original question was a little confusing!我认为我最初的问题有点令人困惑!

mkdir SEQUENCE-MATCHES



for pattern; do
        grep -B1 $pattern my_file.fasta
done < patterns_file.txt > SEQUENCE-MATCHES/$pattern.fasta

This isn't outputting anything at all.这根本不输出任何东西。 I could manually run grep on all of the patterns, but it's too long to be realistic.我可以在所有模式上手动运行 grep,但它太长而不现实。

for pattern; do for pattern; do doesn't set pattern the way you want. for pattern; do不会按照您想要的方式设置pattern

Also, you merged all the output into a single file.此外,您将所有 output 合并到一个文件中。

Try:尝试:

while read pattern; do
    grep -B1 "$pattern" my_file.fasta > SEQUENCE-MATCHES/"$pattern".fasta
done < pattern_file.txt

Note that this code also creates an empty output file for each pattern that doesn't match.请注意,此代码还会为每个不匹配的模式创建一个空的 output 文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在fasta文件中对包含特定基序的完整序列进行Grep处理? - How to Grep the complete sequences containing a specific motif in a fasta file? 如何通过基因 ID 从 Fasta 文件中检索序列 - How to retrieve sequences from a Fasta file by gene ID 从文件中提取特定范围的 fasta 序列 - extract a specific range of fasta sequences from a file 使用python subprocess.call将Fasta序列数写入文件 - Using python subprocess.call for writing count of fasta sequences to file 使用grep测试文件是否为有效的FASTA(即,与单个正则表达式匹配的完整文件) - test if file is valid FASTA using grep (i.e. full file matching single regular expression) Fasta 文件 - 行问题 - Fasta file - line issues 如何使用 header 提取特定的 fasta 文件并在给定文件中排序? - How to extract specific fasta file with header and sequnce in a given file? 在fasta文件中选择序列超过300 aa,“C”至少出现4次 - Select sequences in a fasta file with more than 300 aa and “C” occurs at least 4 times 使用 bash 计算具有多个序列的 fasta 文件中每个序列中 char 的出现次数 - Count the number of occurrences of a char in each sequence in a fasta file with multiple sequences using bash 如何从 fasta 文件 header 中删除前三个字符 - How to remove the first three character from the fasta file header
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM