使用python中的regex提取可变长度数

Question

I have a file in very bad shape but I am being able to parse it and extract most of the values required except one. 我的文件的形状非常糟糕，但是我可以解析它并提取除一个以外的大多数所需值。 And I need you help on how to regex to extract a variable length number. 我需要你帮助如何正则表达式提取可变长度数字。

To parse and extract other features I have used List indexes along with different spliiters '|', ' ' and ':'. 为了解析和提取其他功能，我使用了List索引以及不同的spliiters'|'，''和'：'。 But in this case I am being able to reach to block (below) and have to extract for each row the digits around '_' separately as x and y. 但在这种情况下，我能够到达阻止（下面）并且必须为每一行提取'_'周围的数字作为x和y。

One way could be to first split by ':' and than by ' ' and finally by '-' but and extract index position [0] and [1] but that will be the most in-efficient way to do so. 一种方法可能是首先按'：'而不是''和最后按' - '分割，但提取索引位置[0]和[1]，但这将是最有效的方法。

chr5:17399789-17401949 REVERSE chr5：17399789-17401949反转

chr5:6414488-6415907 FORWARD chr5：6414488-6415907转发

chr5:2981156-2982709 FORWARD chr5：2981156-2982709向前

chr5:6311725-6313323 REVERSE chr5：6311725-6313323 REVERSE

chr5:12791432-12794551 REVERSE chr5：12791432-12794551 REVERSE

chr5:927915-930781 FORWARD chr5：927915-930781转发

chr5:19585936-19587841 FORWARD chr5：19585936-19587841前进

chr5:26894856-26896488 FORWARD chr5：26894856-26896488前进

chr5:18138775-18142147 REVERSE chr5：18138775-18142147 REVERSE

chr5:20537525-20538943 REVERSE chr5：20537525-20538943反向

chr5:22496196-22500543 REVERSE chr5：22496196-22500543 REVERSE

chr5:4747860-4753592 REVERSE chr5：4747860-4753592 REVERSE

The above block has come from 'bigger block' like this: 上面的块来自“更大的块”，如下所示：

Can I extract at 'bigger block' also? 我也可以在“更大的区块”处提取内容吗？

My programming level can be best describes as beginner and need you help. 我的编程水平最好描述为初学者，需要你的帮助。

Thanks 谢谢

AK AK

Answer 1

One approach would be to define your regular expression as the following Python "raw" String: 一种方法是将正则表达式定义为以下Python“原始”字符串：

    numericalBlockRegEx = r'chr\d+:(?P<firstNumBlock>\d+)-(?P<secondNumBlock>\d+)'

Finally, once you actually run your regular expression over each line of the file (you'll likely need to use a call to search rather than match) you can extract the numerical block you're interested in by a simple call to: 最后，一旦在文件的每一行上实际运行了正则表达式（您可能需要使用调用进行搜索而不是匹配），您可以通过以下简单调用来提取您感兴趣的数字块：

    x = match.group('firstNumBlock') #Gets first number block matched
    y = match.group('secondNumBlock') #Gets second number block matched

Cheers! 干杯!

使用python中的regex提取可变长度数

问题描述

1 个解决方案

解决方案1
3 已采纳 2012-02-29 05:05:36

使用python中的regex提取可变长度数

问题描述

1 个解决方案

解决方案1 3 已采纳 2012-02-29 05:05:36

解决方案1
3 已采纳 2012-02-29 05:05:36