简体   繁体   English

如何从文本文件“块”中循环提取行

[英]How to extract lines from text file “block” in loop

I have a huge text file and I want to use grep to search if some "blocks" in my text file is existing in another file. 我有一个巨大的文本文件,我想使用grep搜索文本文件中的“块”是否存在于另一个文件中。 So, I need to extract these blocks first. 因此,我需要首先提取这些块。

This is my file: 这是我的文件:

>gi|60117238|gb|AY897435.1| Wolbachia endosymbiont of Drosophila mojavensis, genomic survey sequence
TCTGTTGCGAGTGTGCTGATAACTACTGAATCTATGATAGTTGATGTACCAAGCAAAGAAAATGCTTCATCTCCTATGGG
TGCAGGAGAAATGAGTGGCATGGGTGGATTCTAAGTAGAATGAAACCGTGGAGCAATTGCTCCACGGTAGTTCCAAAAAA
TCTCACATTTTACTATTCGTTAAAGGTAATACGTTTGGTGCAGAAATGCACTACTGTTTGCATCCGTTTCGCTCCTTTAT
ATTGTGGTTGTCTAATAACAAAAAGGCAGCATAAGAAAACTATAACACCTAGTATATTTATACTATAGCTGACCCAAGCA
ACACGTCATACCGCGATTCATTCCACAACTGTACGAACATTACAATATGGCACATAGTAAACGATGTCATGAAAGTAGCT
GACACTGGAATTCAGAAAAAAGGATTATGTCATTCCAGTGCTTGACACTGGAATCCAGCATTTCCATAATCATCAAAACA
TTGTATTTTAACAAAAAACATGTATTTTTATGCTTGCCAACTTAATAAAATTCCTGGATCCCAGTGTCAAGCACTGGGAT
GACAC
>gi|60117239|gb|AY897436.1| Wolbachia endosymbiont of Drosophila mojavensis, genomic survey sequence
TTTTCATCGCTCATGTCCTTAGTTTACCCCCTGTTTCACCATTACATTAATATCTACAGAACCTCCCACTGGGGAGTAGT
AATCTAGGATAGTTTCTATCACTAAAACGCGTGGTATTCCTTTATTTTTTACCAATTTTAAATAAGACAATACCTTATTA
TCATCATAATGCTGCAGAAAGCGGCAAAAGACACCTAATTCATAATTTGTAGCTGATAATTCTTCTTGAGTTATGAGTTT
AATTTTTAAATCTTCTACTGCCTGCCTAGGCACTTTATGTTCGTTGTAATAATATAAGCCTATAGAACCTTTATTGTGTA
TATCAGAATAAGCAAGAAATAAAGAGTGTACGCCAAATAGCAATATATTTTTAGCACCATCTATATTAACCCTAGAATTA
AACTCTTTAGTGTCAAACCTGGAATATCCTAGCAATGCTTGGTAAAACGCTATTTTCCTGTCTTCTGATGTTTCTTTCTC
CTTAAAAAGAATCAAATGAAAATATTGACTCCTGCCTTAAAATATCCGGCATTTTTAACCAATTCTTTTCAGCGGCAACC
CTTGCCCACATTGCTGCTGCTTTAGGAAAAATGGTATTTCTTTAAACACTTACCTTTTGATGAAAGTTGCCCAAAATCCT
TTGTTCTATCCGAATCCAAAACCCCTATTTCCCAAACGCCCCTTAAAACCTTTTTTAAAATTGGAACAAAAAATATTTAA
TTTTTAAAAAAAAACG
>gi|60117240|gb|AY897437.1| Wolbachia endosymbiont of Drosophila mojavensis, genomic survey sequence
TTGNCCATCAATTGGCCACCAGAAAAGTTGCGTCCGTTTACTTCTACACCATGTATAAATGCACCTAAAATCATGCCTTG
GCAAAATGCAGCACCAAGTGACCCAAAATGAAAGGCATAATCCCATAATCGCCTGTATTTTCCTTCTGCCTTAAAACGAA
ACTCAAAGGATACTCCGCGCACTATAAGGCCAAGCAGCATAATAATGATTGGAATATAAAAAGCAGGCATTAATATTGAA
TATGCAAGAGGAAAAGCAGCAAACAACCCTCCACCACCTAGTACCAACCATGTTTCGTTTCCATCCCAAAATGGTGCAAT
TGAGCTTATCATGTGATCACGGCATTTATCTGACGGTGCAAAAGGAAGTAAAATACCAATACCTAAATCAAACCCATCCA
TTAAAATATACAGTAAAACAGCTATGGCAATTAGTAATCCCCAGATTAGGGGTAAATTAATTAAGGAAGAAAAATCAAAC
ATGATTGTTGTCCTTTCCAGATGTACCAGCATCAATCACTGAAGCTCCAATACCGTGTTTATAAAATTGCTCTTCTTCTT
TAATGACAGGAATTCCTTTGTATATAAGTTTCAGAATATAGTATCTACCTGCTCCAAATATAAGGGTATACATAAACGAT
AAATGCAATCAAAGACCATGCAACCTGAGGACCGGTAATCGCAGATGAAAATGATTCAATTGTGCCGTTAATTCCATATA
CAGTGTAAAGTTGACGGCCAATTTCATAGTAAACCAAACTGCAAGTAACGCTATGGACCCCGACGGCATCTTTGAAATCC
ACAATCCTTTGAAAACACAACTTTGGAATAATTTGCCCCGAAAAATACTGAAAAAAAATTTACTGGACCCATTTTGGATT
ATTAAAATTTCAACTCCAACCATTTATACGGG

Block is starting from > to the letter befor the next >. 块从>到下一个>之前的字母开始。

So, 1st block is: 因此,第一个块是:

TCTGTTGCGAGTGTGCTGATAACTACTGAATCTATGATAGTTGATGTACCAAGCAAAGAAAATGCTTCATCTCCTATGGG
TGCAGGAGAAATGAGTGGCATGGGTGGATTCTAAGTAGAATGAAACCGTGGAGCAATTGCTCCACGGTAGTTCCAAAAAA
TCTCACATTTTACTATTCGTTAAAGGTAATACGTTTGGTGCAGAAATGCACTACTGTTTGCATCCGTTTCGCTCCTTTAT
ATTGTGGTTGTCTAATAACAAAAAGGCAGCATAAGAAAACTATAACACCTAGTATATTTATACTATAGCTGACCCAAGCA
ACACGTCATACCGCGATTCATTCCACAACTGTACGAACATTACAATATGGCACATAGTAAACGATGTCATGAAAGTAGCT
GACACTGGAATTCAGAAAAAAGGATTATGTCATTCCAGTGCTTGACACTGGAATCCAGCATTTCCATAATCATCAAAACA
TTGTATTTTAACAAAAAACATGTATTTTTATGCTTGCCAACTTAATAAAATTCCTGGATCCCAGTGTCAAGCACTGGGAT
GACAC

2nd block is: 第二块是:

TTTTCATCGCTCATGTCCTTAGTTTACCCCCTGTTTCACCATTACATTAATATCTACAGAACCTCCCACTGGGGAGTAGT
AATCTAGGATAGTTTCTATCACTAAAACGCGTGGTATTCCTTTATTTTTTACCAATTTTAAATAAGACAATACCTTATTA
TCATCATAATGCTGCAGAAAGCGGCAAAAGACACCTAATTCATAATTTGTAGCTGATAATTCTTCTTGAGTTATGAGTTT
AATTTTTAAATCTTCTACTGCCTGCCTAGGCACTTTATGTTCGTTGTAATAATATAAGCCTATAGAACCTTTATTGTGTA
TATCAGAATAAGCAAGAAATAAAGAGTGTACGCCAAATAGCAATATATTTTTAGCACCATCTATATTAACCCTAGAATTA
AACTCTTTAGTGTCAAACCTGGAATATCCTAGCAATGCTTGGTAAAACGCTATTTTCCTGTCTTCTGATGTTTCTTTCTC
CTTAAAAAGAATCAAATGAAAATATTGACTCCTGCCTTAAAATATCCGGCATTTTTAACCAATTCTTTTCAGCGGCAACC
CTTGCCCACATTGCTGCTGCTTTAGGAAAAATGGTATTTCTTTAAACACTTACCTTTTGATGAAAGTTGCCCAAAATCCT
TTGTTCTATCCGAATCCAAAACCCCTATTTCCCAAACGCCCCTTAAAACCTTTTTTAAAATTGGAACAAAAAATATTTAA
TTTTTAAAAAAAAACG

Third block: 第三块:

TTGNCCATCAATTGGCCACCAGAAAAGTTGCGTCCGTTTACTTCTACACCATGTATAAATGCACCTAAAATCATGCCTTG
GCAAAATGCAGCACCAAGTGACCCAAAATGAAAGGCATAATCCCATAATCGCCTGTATTTTCCTTCTGCCTTAAAACGAA
ACTCAAAGGATACTCCGCGCACTATAAGGCCAAGCAGCATAATAATGATTGGAATATAAAAAGCAGGCATTAATATTGAA
TATGCAAGAGGAAAAGCAGCAAACAACCCTCCACCACCTAGTACCAACCATGTTTCGTTTCCATCCCAAAATGGTGCAAT
TGAGCTTATCATGTGATCACGGCATTTATCTGACGGTGCAAAAGGAAGTAAAATACCAATACCTAAATCAAACCCATCCA
TTAAAATATACAGTAAAACAGCTATGGCAATTAGTAATCCCCAGATTAGGGGTAAATTAATTAAGGAAGAAAAATCAAAC
ATGATTGTTGTCCTTTCCAGATGTACCAGCATCAATCACTGAAGCTCCAATACCGTGTTTATAAAATTGCTCTTCTTCTT
TAATGACAGGAATTCCTTTGTATATAAGTTTCAGAATATAGTATCTACCTGCTCCAAATATAAGGGTATACATAAACGAT
AAATGCAATCAAAGACCATGCAACCTGAGGACCGGTAATCGCAGATGAAAATGATTCAATTGTGCCGTTAATTCCATATA
CAGTGTAAAGTTGACGGCCAATTTCATAGTAAACCAAACTGCAAGTAACGCTATGGACCCCGACGGCATCTTTGAAATCC
ACAATCCTTTGAAAACACAACTTTGGAATAATTTGCCCCGAAAAATACTGAAAAAAAATTTACTGGACCCATTTTGGATT
ATTAAAATTTCAACTCCAACCATTTATACGGG

How can I loop my file and extract one block in each iteration? 如何循环我的文件并在每次迭代中提取一个块? To grep it with the other file? 要与其他文件一起grep吗?

Edit 1: 编辑1:

For more clarification: 有关更多说明:

I want to do some operation on each block. 我想对每个块进行一些操作。 First, I perform diff between two files and but the result in a new file. 首先,我在两个文件之间执行diff,但是结果是一个新文件。 For the new file which contains the blocks, i want to search if each block is included in the first file or in the second file. 对于包含这些块的新文件,我想搜索每个块是否包含在第一个文件或第二个文件中。 If it is included in the first file, i want to extract it to another new file. 如果它包含在第一个文件中,我想将其提取到另一个新文件中。 If it is included in the second file, i want to escape and go to next block. 如果它包含在第二个文件中,我想转义并转到下一个块。

Hope you getting my point. 希望你明白我的意思。

Thanks, 谢谢,

Do you want to create a separate file for each block? 是否要为每个块创建一个单独的文件? And then you want to do any operation on those files? 然后您要对这些文件执行任何操作? Or you just want to do some operation(say search/grep) for each block in each loop iteration? 还是只想对每个循环迭代中的每个块执行一些操作(例如search / grep)? Please clarify your requirement. 请阐明您的要求。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM