简体   繁体   English

我想使用python提取并打印每个块的第一行中包含10917、11396和1116920的块

[英]Using python, I want to extract and print blocks that contain 10917, 11396 and 1116920 in first line of each block

Each block starts with hg19 and ends with the white space. 每个块均以hg19开头,以空格结尾。 Can I use regular expressions to extract the required blocks? 我可以使用正则表达式提取所需的块吗?

hg19.chr1 10917 479
panTro2.chr15 13606 455

hg19.chr1 11396 93
panTro2.chr15 14061 42
bosTau4.chr5 113864279 105

hg19.chr1 11489 81
panTro2.chr15 14103 81
bosTau4.chr5 113864398 80
equCab2.chr6 54105327 83
canFam2.chr27 45128907 82
calJac1.Contig8673 78513 67

hg19.chr1 1116920 38
panTro2.chr1 1103202 38
gorGor1.Supercontig_0004540 23214 38
ponAbe2.chr1 534356 38
papHam1.scaffold19767 38455 38
calJac1.Contig4288 217257 29
micMur1.scaffold_101519 296 37
dipOrd1.scaffold_7421 49811 22
cavPor3.scaffold_186 248497 22
bosTau4.chr16 29320296 47
equCab2.chr2 72413055 53
felCat3.scaffold_124042 293309 9

hg19.chr1 1116863 57
papHam1.scaffold19767 38399 56
ponAbe2.chr1 534300 56


and so on...

I've tried doing with various regular expressions, but wasn't successful. 我尝试过使用各种正则表达式,但是没有成功。

The following will read your data from a file called input.txt . 以下内容将从名为input.txt的文件读取数据。 It then creates a list containing all of the blocks. 然后,它创建一个包含所有块的列表。 It then filters this list to only contain the required entries and then displays them: 然后,它过滤此列表以仅包含必需的条目,然后显示它们:

import re

with open('input.txt') as f_input:
    data = f_input.read()
    blocks = re.findall(r'(^hg19\..*?)\n*?(?=^hg19\.|\Z)', data, re.S + re.M)

allowed = set(["10917", "11396", "1116920"])
blocks = [block for block in blocks if block.split('\n', 1)[0].split()[1] in allowed]

for block in blocks:
    print block
    print '----'

This would display the following: 这将显示以下内容:

hg19.chr1 10917 479
panTro2.chr15 13606 455
----
hg19.chr1 11396 93
panTro2.chr15 14061 42
bosTau4.chr5 113864279 105
----
hg19.chr1 1116920 38
panTro2.chr1 1103202 38
gorGor1.Supercontig_0004540 23214 38
ponAbe2.chr1 534356 38
papHam1.scaffold19767 38455 38
calJac1.Contig4288 217257 29
micMur1.scaffold_101519 296 37
dipOrd1.scaffold_7421 49811 22
cavPor3.scaffold_186 248497 22
bosTau4.chr16 29320296 47
equCab2.chr2 72413055 53
felCat3.scaffold_124042 293309 9
----

This assumes that your file is small enough to comfortably fit into memory at once. 假设您的文件足够小,可以一次舒适地放入内存。 Tested using Python 2.7.6 使用Python 2.7.6测试

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我想使用python块将文件上传到azure blob。 (使用-put_block) - I want to upload a file to azure blob using python in blocks. (using-put_block) 我想在不同的行上打印每一行 - I want to print each line on a different row 如何在 Python 中打印文本文件每个段落的第一行? - How do I print the first line of each paragraph of a text file in Python? 我想在单独的行上打印每个数字的平方 - I want print the square of each number on a separate line 使用python提取电子邮件正文的第一行 - Extract first line of email body using python 我使用 python 创建了一个排行榜,但我的代码打印在一行中。 我希望它像排行榜格式一样单独打印每个元素 - I have created a leaderboard using python but my code prints in a single line. I want it to print each element separately like in a leaderboard format 我想在 python 中打印名字姓氏和姓氏名字 - I want to print first name last and second name first in python 使用Python编写的河内塔,使用列表打印每个步骤。 但我不想使用全局变量 - Towers of Hanoi with Python, print each step by using list. but i don't want to use global variable 我想使用python 3.4仅一次在for循环中打印特定行 - I want to print specific line just one time in for loop using python 3.4 我只想更新我的 python 代码中的第一个打印语句 - I just want to update first print statment in my python code
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM