[英]Using python, I want to extract and print blocks that contain 10917, 11396 and 1116920 in first line of each block
Each block starts with hg19 and ends with the white space. 每个块均以hg19开头,以空格结尾。 Can I use regular expressions to extract the required blocks? 我可以使用正则表达式提取所需的块吗?
hg19.chr1 10917 479
panTro2.chr15 13606 455
hg19.chr1 11396 93
panTro2.chr15 14061 42
bosTau4.chr5 113864279 105
hg19.chr1 11489 81
panTro2.chr15 14103 81
bosTau4.chr5 113864398 80
equCab2.chr6 54105327 83
canFam2.chr27 45128907 82
calJac1.Contig8673 78513 67
hg19.chr1 1116920 38
panTro2.chr1 1103202 38
gorGor1.Supercontig_0004540 23214 38
ponAbe2.chr1 534356 38
papHam1.scaffold19767 38455 38
calJac1.Contig4288 217257 29
micMur1.scaffold_101519 296 37
dipOrd1.scaffold_7421 49811 22
cavPor3.scaffold_186 248497 22
bosTau4.chr16 29320296 47
equCab2.chr2 72413055 53
felCat3.scaffold_124042 293309 9
hg19.chr1 1116863 57
papHam1.scaffold19767 38399 56
ponAbe2.chr1 534300 56
and so on...
I've tried doing with various regular expressions, but wasn't successful. 我尝试过使用各种正则表达式,但是没有成功。
The following will read your data from a file called input.txt
. 以下内容将从名为input.txt
的文件读取数据。 It then creates a list containing all of the blocks. 然后,它创建一个包含所有块的列表。 It then filters this list to only contain the required entries and then displays them: 然后,它过滤此列表以仅包含必需的条目,然后显示它们:
import re
with open('input.txt') as f_input:
data = f_input.read()
blocks = re.findall(r'(^hg19\..*?)\n*?(?=^hg19\.|\Z)', data, re.S + re.M)
allowed = set(["10917", "11396", "1116920"])
blocks = [block for block in blocks if block.split('\n', 1)[0].split()[1] in allowed]
for block in blocks:
print block
print '----'
This would display the following: 这将显示以下内容:
hg19.chr1 10917 479
panTro2.chr15 13606 455
----
hg19.chr1 11396 93
panTro2.chr15 14061 42
bosTau4.chr5 113864279 105
----
hg19.chr1 1116920 38
panTro2.chr1 1103202 38
gorGor1.Supercontig_0004540 23214 38
ponAbe2.chr1 534356 38
papHam1.scaffold19767 38455 38
calJac1.Contig4288 217257 29
micMur1.scaffold_101519 296 37
dipOrd1.scaffold_7421 49811 22
cavPor3.scaffold_186 248497 22
bosTau4.chr16 29320296 47
equCab2.chr2 72413055 53
felCat3.scaffold_124042 293309 9
----
This assumes that your file is small enough to comfortably fit into memory at once. 假设您的文件足够小,可以一次舒适地放入内存。 Tested using Python 2.7.6 使用Python 2.7.6测试
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.