简体   繁体   English

Python:使用re模块查找字符串,然后在字符串下打印值

[英]Python: Using re module to find string, then print values under string

I am attempting to use the re module to search a string of a fairly large file. 我正在尝试使用re模块来搜索相当大文件的字符串。 The file I am searching has the following format: 我正在搜索的文件具有以下格式:

      220
      BOX 1,  STEP 1
      C        15.1760586379       13.7666285127        4.1579861659
      F        13.7752750995       13.3845518556        4.1992254467
      F        15.1122807811       15.0753387163        3.8457966464
      H        15.5298304628       13.5873563855        5.1615910859
      H        15.6594416869       13.1246597008        3.3754112615
        5
     BOX 2,  STEP 1
     C        15.1760586379       13.7666285127        4.1579861659
     F        13.7752750995       13.3845518556        4.1992254467
     F        15.1122807811       15.0753387163        3.8457966464
     H        15.5298304628       13.5873563855        5.1615910859
     H        15.6594416869       13.1246597008        3.3754112615
       240
     BOX 1,  STEP 2
     C        12.6851133069        2.8636250164        1.1788963097
     F        11.7935769268        1.7912366066        1.3042188034
     F        13.7887138736        2.3739304018        0.4126088380
     H        12.1153838312        3.7024696077        0.7164304431
     H        13.0962656950        3.1549047758        2.1436863477
     C        12.6745394723        3.6338848332       15.1374252921
     F        11.8703828307        4.3473226569       16.0480492173
     F        12.2304604843        2.3709059503       14.9433964493
     H        12.6002811971        4.1968554204       14.1449118786
     H        13.7469256153        3.6086212350       15.5204655285

This format continues on for Box 1 and Box 2 for ~30000 STEPS total, for each BOX. 对于每个BOX,此格式在Box 1和Box 2上继续进行,总计〜30000 STEP。 I have code that utilizes the re module to searches this file based on the keyword "STEP". 我有利用re模块基于关键字“ STEP”搜索此文件的代码。 Unfortunately, it does not yield any results when I run it. 不幸的是,当我运行它时没有任何结果。 I need my code to search 1) for ONLY Box 1, then 2) print/output all the coordinates(preferably omitting the "C's, F's, H's"; so only the coordinates) beginning after STEP 1 to a file, 3) increment the "STEP" number by 48 and then repeat 2). 我需要我的代码搜索1) 用于框1,然后2)打印/输出所有坐标(最好省略“ C,F,H”;因此仅删除坐标),从文件1开始到文件,3)递增将“ STEP”数字减48,然后重复2)。 I also want to ignore the "5" and the "240" in the file that I am searching; 我也想忽略我正在搜索的文件中的“ 5”和“ 240”; so the code should compensate so that this is not included in the output after we search this file. 因此代码应进行补偿,以便在搜索此文件后不将其包含在输出中。 This is what I have thus far (it does not work): 到目前为止,这是我所拥有的(它不起作用):

 import re
 shakes = open("mc_coordinates", "r")
 i = 1
 for line in shakes:
        if re.match("(.*)STEP i(.*)", line):
               print line
        i+=48

This is an example of what I what my code to do: 这是我要执行的代码示例:

  STEP 1
    15.1760586379       13.7666285127        4.1579861659
    13.7752750995       13.3845518556        4.1992254467
    15.1122807811       15.0753387163        3.8457966464
    15.5298304628       13.5873563855        5.1615910859
    15.6594416869       13.1246597008        3.3754112615  
  STEP 49
    12.6851133069        2.8636250164        1.1788963097
    11.7935769268        1.7912366066        1.3042188034
    13.7887138736        2.3739304018        0.4126088380
    12.1153838312        3.7024696077        0.7164304431
    13.0962656950        3.1549047758        2.1436863477
    12.6745394723        3.6338848332       15.1374252921
    11.8703828307        4.3473226569       16.0480492173
    12.2304604843        2.3709059503       14.9433964493
    12.6002811971        4.1968554204       14.1449118786
    13.7469256153        3.6086212350       15.5204655285
  STEP 97
    15.1760586379       13.7666285127        4.1579861659
    13.7752750995       13.3845518556        4.1992254467
    15.1122807811       15.0753387163        3.8457966464
    15.5298304628       13.5873563855        5.1615910859
    15.6594416869       13.1246597008        3.3754112615  

It should be noted that this is a condensed version, typically there will be ~250 lines of coordinates in between "STEP" numbers. 应当注意,这是一个精简版本,通常在“ STEP”数字之间会有〜250行坐标。 Any ideas or thought will be appreciated. 任何想法或想法将不胜感激。 Thanks!! 谢谢!!

A quick although maybe not efficent way is to just parse line by line and add some states. 一种快速但可能不是有效的方法是逐行解析并添加一些状态。

# untested code, but i think you get the idea
import re
shakes = open("mc_coordinates", "r")
i = 1
output = False # are we in a block that should be output?
for line in shakes:
    if re.match("(.*)STEP i(.*)", line): # tune this to match only for BOX 1
        print line
        output = true
        i+=48
    elif re.match("(.*)STEP i(.*)", line):
        # some other box or step
        output = false
    elif output:
        print line # or remove the first few chars to get rid of C,F or Hs.

It seems like the easiest way to do this would be to have two regex patterns: 1. Find the 'BOX 1, STEP 48N+1' string. 似乎最简单的方法是使用两个正则表达式模式:1.找到“ BOX 1,STEP 48N + 1”字符串。 2. Get the coordinates. 2.获取坐标。

I'm providing some code below. 我在下面提供一些代码。 Haven't tried it on your stuff but it should be easy to fix the bugs. 尚未在您的产品上尝试过,但是应该很容易修复错误。 Basically, what you need is a small state machine that tells you when you should and should not print out the coordinates 基本上,您需要的是一台小型状态机,该状态机会告诉您何时应该和不应该打印出坐标

step_re = re.compile(r'BOX 1,\s+STEP (\d+)')
coord_re = re.compile(r'\s*(\d+.\d+)'*3)
in_step = False
for line in io.open('your_file.txt', rb):
  if in_step:
    coord_match = coord_re.search(line)
    if coord_match:
      print coord_match.group(1), coord_match.group(2), coord_match.group(3)
    else:
      in_step = False
    continue

  step_match = step_re.match(line)
  if step_match and (int(step_match.group(1)) % 48) == 1:
    print 'STEP {}'.format(step_match.group(1))
    in_step = True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM