[英]How to extract lines from text file “block” in loop
I have a huge text file and I want to use grep to search if some "blocks" in my text file is existing in another file. 我有一个巨大的文本文件,我想使用grep搜索文本文件中的“块”是否存在于另一个文件中。 So, I need to extract these blocks first.
因此,我需要首先提取这些块。
This is my file: 这是我的文件:
>gi|60117238|gb|AY897435.1| Wolbachia endosymbiont of Drosophila mojavensis, genomic survey sequence
TCTGTTGCGAGTGTGCTGATAACTACTGAATCTATGATAGTTGATGTACCAAGCAAAGAAAATGCTTCATCTCCTATGGG
TGCAGGAGAAATGAGTGGCATGGGTGGATTCTAAGTAGAATGAAACCGTGGAGCAATTGCTCCACGGTAGTTCCAAAAAA
TCTCACATTTTACTATTCGTTAAAGGTAATACGTTTGGTGCAGAAATGCACTACTGTTTGCATCCGTTTCGCTCCTTTAT
ATTGTGGTTGTCTAATAACAAAAAGGCAGCATAAGAAAACTATAACACCTAGTATATTTATACTATAGCTGACCCAAGCA
ACACGTCATACCGCGATTCATTCCACAACTGTACGAACATTACAATATGGCACATAGTAAACGATGTCATGAAAGTAGCT
GACACTGGAATTCAGAAAAAAGGATTATGTCATTCCAGTGCTTGACACTGGAATCCAGCATTTCCATAATCATCAAAACA
TTGTATTTTAACAAAAAACATGTATTTTTATGCTTGCCAACTTAATAAAATTCCTGGATCCCAGTGTCAAGCACTGGGAT
GACAC
>gi|60117239|gb|AY897436.1| Wolbachia endosymbiont of Drosophila mojavensis, genomic survey sequence
TTTTCATCGCTCATGTCCTTAGTTTACCCCCTGTTTCACCATTACATTAATATCTACAGAACCTCCCACTGGGGAGTAGT
AATCTAGGATAGTTTCTATCACTAAAACGCGTGGTATTCCTTTATTTTTTACCAATTTTAAATAAGACAATACCTTATTA
TCATCATAATGCTGCAGAAAGCGGCAAAAGACACCTAATTCATAATTTGTAGCTGATAATTCTTCTTGAGTTATGAGTTT
AATTTTTAAATCTTCTACTGCCTGCCTAGGCACTTTATGTTCGTTGTAATAATATAAGCCTATAGAACCTTTATTGTGTA
TATCAGAATAAGCAAGAAATAAAGAGTGTACGCCAAATAGCAATATATTTTTAGCACCATCTATATTAACCCTAGAATTA
AACTCTTTAGTGTCAAACCTGGAATATCCTAGCAATGCTTGGTAAAACGCTATTTTCCTGTCTTCTGATGTTTCTTTCTC
CTTAAAAAGAATCAAATGAAAATATTGACTCCTGCCTTAAAATATCCGGCATTTTTAACCAATTCTTTTCAGCGGCAACC
CTTGCCCACATTGCTGCTGCTTTAGGAAAAATGGTATTTCTTTAAACACTTACCTTTTGATGAAAGTTGCCCAAAATCCT
TTGTTCTATCCGAATCCAAAACCCCTATTTCCCAAACGCCCCTTAAAACCTTTTTTAAAATTGGAACAAAAAATATTTAA
TTTTTAAAAAAAAACG
>gi|60117240|gb|AY897437.1| Wolbachia endosymbiont of Drosophila mojavensis, genomic survey sequence
TTGNCCATCAATTGGCCACCAGAAAAGTTGCGTCCGTTTACTTCTACACCATGTATAAATGCACCTAAAATCATGCCTTG
GCAAAATGCAGCACCAAGTGACCCAAAATGAAAGGCATAATCCCATAATCGCCTGTATTTTCCTTCTGCCTTAAAACGAA
ACTCAAAGGATACTCCGCGCACTATAAGGCCAAGCAGCATAATAATGATTGGAATATAAAAAGCAGGCATTAATATTGAA
TATGCAAGAGGAAAAGCAGCAAACAACCCTCCACCACCTAGTACCAACCATGTTTCGTTTCCATCCCAAAATGGTGCAAT
TGAGCTTATCATGTGATCACGGCATTTATCTGACGGTGCAAAAGGAAGTAAAATACCAATACCTAAATCAAACCCATCCA
TTAAAATATACAGTAAAACAGCTATGGCAATTAGTAATCCCCAGATTAGGGGTAAATTAATTAAGGAAGAAAAATCAAAC
ATGATTGTTGTCCTTTCCAGATGTACCAGCATCAATCACTGAAGCTCCAATACCGTGTTTATAAAATTGCTCTTCTTCTT
TAATGACAGGAATTCCTTTGTATATAAGTTTCAGAATATAGTATCTACCTGCTCCAAATATAAGGGTATACATAAACGAT
AAATGCAATCAAAGACCATGCAACCTGAGGACCGGTAATCGCAGATGAAAATGATTCAATTGTGCCGTTAATTCCATATA
CAGTGTAAAGTTGACGGCCAATTTCATAGTAAACCAAACTGCAAGTAACGCTATGGACCCCGACGGCATCTTTGAAATCC
ACAATCCTTTGAAAACACAACTTTGGAATAATTTGCCCCGAAAAATACTGAAAAAAAATTTACTGGACCCATTTTGGATT
ATTAAAATTTCAACTCCAACCATTTATACGGG
Block is starting from > to the letter befor the next >. 块从>到下一个>之前的字母开始。
So, 1st block is: 因此,第一个块是:
TCTGTTGCGAGTGTGCTGATAACTACTGAATCTATGATAGTTGATGTACCAAGCAAAGAAAATGCTTCATCTCCTATGGG
TGCAGGAGAAATGAGTGGCATGGGTGGATTCTAAGTAGAATGAAACCGTGGAGCAATTGCTCCACGGTAGTTCCAAAAAA
TCTCACATTTTACTATTCGTTAAAGGTAATACGTTTGGTGCAGAAATGCACTACTGTTTGCATCCGTTTCGCTCCTTTAT
ATTGTGGTTGTCTAATAACAAAAAGGCAGCATAAGAAAACTATAACACCTAGTATATTTATACTATAGCTGACCCAAGCA
ACACGTCATACCGCGATTCATTCCACAACTGTACGAACATTACAATATGGCACATAGTAAACGATGTCATGAAAGTAGCT
GACACTGGAATTCAGAAAAAAGGATTATGTCATTCCAGTGCTTGACACTGGAATCCAGCATTTCCATAATCATCAAAACA
TTGTATTTTAACAAAAAACATGTATTTTTATGCTTGCCAACTTAATAAAATTCCTGGATCCCAGTGTCAAGCACTGGGAT
GACAC
2nd block is: 第二块是:
TTTTCATCGCTCATGTCCTTAGTTTACCCCCTGTTTCACCATTACATTAATATCTACAGAACCTCCCACTGGGGAGTAGT
AATCTAGGATAGTTTCTATCACTAAAACGCGTGGTATTCCTTTATTTTTTACCAATTTTAAATAAGACAATACCTTATTA
TCATCATAATGCTGCAGAAAGCGGCAAAAGACACCTAATTCATAATTTGTAGCTGATAATTCTTCTTGAGTTATGAGTTT
AATTTTTAAATCTTCTACTGCCTGCCTAGGCACTTTATGTTCGTTGTAATAATATAAGCCTATAGAACCTTTATTGTGTA
TATCAGAATAAGCAAGAAATAAAGAGTGTACGCCAAATAGCAATATATTTTTAGCACCATCTATATTAACCCTAGAATTA
AACTCTTTAGTGTCAAACCTGGAATATCCTAGCAATGCTTGGTAAAACGCTATTTTCCTGTCTTCTGATGTTTCTTTCTC
CTTAAAAAGAATCAAATGAAAATATTGACTCCTGCCTTAAAATATCCGGCATTTTTAACCAATTCTTTTCAGCGGCAACC
CTTGCCCACATTGCTGCTGCTTTAGGAAAAATGGTATTTCTTTAAACACTTACCTTTTGATGAAAGTTGCCCAAAATCCT
TTGTTCTATCCGAATCCAAAACCCCTATTTCCCAAACGCCCCTTAAAACCTTTTTTAAAATTGGAACAAAAAATATTTAA
TTTTTAAAAAAAAACG
Third block: 第三块:
TTGNCCATCAATTGGCCACCAGAAAAGTTGCGTCCGTTTACTTCTACACCATGTATAAATGCACCTAAAATCATGCCTTG
GCAAAATGCAGCACCAAGTGACCCAAAATGAAAGGCATAATCCCATAATCGCCTGTATTTTCCTTCTGCCTTAAAACGAA
ACTCAAAGGATACTCCGCGCACTATAAGGCCAAGCAGCATAATAATGATTGGAATATAAAAAGCAGGCATTAATATTGAA
TATGCAAGAGGAAAAGCAGCAAACAACCCTCCACCACCTAGTACCAACCATGTTTCGTTTCCATCCCAAAATGGTGCAAT
TGAGCTTATCATGTGATCACGGCATTTATCTGACGGTGCAAAAGGAAGTAAAATACCAATACCTAAATCAAACCCATCCA
TTAAAATATACAGTAAAACAGCTATGGCAATTAGTAATCCCCAGATTAGGGGTAAATTAATTAAGGAAGAAAAATCAAAC
ATGATTGTTGTCCTTTCCAGATGTACCAGCATCAATCACTGAAGCTCCAATACCGTGTTTATAAAATTGCTCTTCTTCTT
TAATGACAGGAATTCCTTTGTATATAAGTTTCAGAATATAGTATCTACCTGCTCCAAATATAAGGGTATACATAAACGAT
AAATGCAATCAAAGACCATGCAACCTGAGGACCGGTAATCGCAGATGAAAATGATTCAATTGTGCCGTTAATTCCATATA
CAGTGTAAAGTTGACGGCCAATTTCATAGTAAACCAAACTGCAAGTAACGCTATGGACCCCGACGGCATCTTTGAAATCC
ACAATCCTTTGAAAACACAACTTTGGAATAATTTGCCCCGAAAAATACTGAAAAAAAATTTACTGGACCCATTTTGGATT
ATTAAAATTTCAACTCCAACCATTTATACGGG
How can I loop my file and extract one block in each iteration? 如何循环我的文件并在每次迭代中提取一个块? To grep it with the other file?
要与其他文件一起grep吗?
Edit 1: 编辑1:
For more clarification: 有关更多说明:
I want to do some operation on each block. 我想对每个块进行一些操作。 First, I perform diff between two files and but the result in a new file.
首先,我在两个文件之间执行diff,但是结果是一个新文件。 For the new file which contains the blocks, i want to search if each block is included in the first file or in the second file.
对于包含这些块的新文件,我想搜索每个块是否包含在第一个文件或第二个文件中。 If it is included in the first file, i want to extract it to another new file.
如果它包含在第一个文件中,我想将其提取到另一个新文件中。 If it is included in the second file, i want to escape and go to next block.
如果它包含在第二个文件中,我想转义并转到下一个块。
Hope you getting my point. 希望你明白我的意思。
Thanks, 谢谢,
Do you want to create a separate file for each block? 是否要为每个块创建一个单独的文件? And then you want to do any operation on those files?
然后您要对这些文件执行任何操作? Or you just want to do some operation(say search/grep) for each block in each loop iteration?
还是只想对每个循环迭代中的每个块执行一些操作(例如search / grep)? Please clarify your requirement.
请阐明您的要求。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.