简体   繁体   English

使用python从多个元数据文件中提取特定文本

[英]extract specific text from several metadata files using python

How to extract WESTBOUNDINGCOORDINATE, NORTHBOUNDINGCOORDINATE, EASTBOUNDINGCOORDINATE, and SOUTHBOUNDINGCOORDINATE from the text below? 如何从下面的文本中提取WESTBOUNDINGCOORDINATE,NORTHBOUNDINGCOORDINATE,EASTBOUNDINGCOORDINATE和SOUTHBOUNDINGCOORDINATE? However, all metafiles do not have the texts in the same line, for example, fine one has 但是,所有图元文件的同一行中都没有文本,例如,其中一个很好
WESTBOUNDINGCOORDINATE in line 2 but file two has it in line 4. Please help... WESTBOUNDINGCOORDINATE在第2行,但文件2在第4行。请帮助...

    GROUP                  = BOUNDINGRECTANGLE              

    OBJECT                 = WESTBOUNDINGCOORDINATE             
      NUM_VAL              = 1              
      VALUE                = 80.8290376770946               
    END_OBJECT             = WESTBOUNDINGCOORDINATE             

    OBJECT                 = NORTHBOUNDINGCOORDINATE                
      NUM_VAL              = 1              
      VALUE                = 39.9999999964079               
    END_OBJECT             = NORTHBOUNDINGCOORDINATE                

    OBJECT                 = EASTBOUNDINGCOORDINATE             
      NUM_VAL              = 1              
      VALUE                = 104.443461525786               
    END_OBJECT             = EASTBOUNDINGCOORDINATE             

    OBJECT                 = SOUTHBOUNDINGCOORDINATE                
      NUM_VAL              = 1              
      VALUE                = 29.9999999973059               
    END_OBJECT             = SOUTHBOUNDINGCOORDINATE                

  END_GROUP              = BOUNDINGRECTANGLE

My code: 我的代码:

metafiles = glob.glob("D://*.txt")
for f in metafiles:
   with open (f, 'r') as infile:
      lines = infile.readlines()
      WESTBOUNDINGCOORDINATE = lines[4][29:45]
      print (WESTBOUNDINGCOORDINATE)

The problem is that WESTBOUNDINGCOORDINATE value is not always in the same line. 问题在于WESTBOUNDINGCOORDINATE值并不总是在同一行中。

Try iterating through the file, ignoring all empty lines, and looking for lines which begin with the string "OBJECT" and end with the coordinate you want. 尝试遍历文件,忽略所有空行,并查找以字符串"OBJECT"开头并以所需坐标结尾的行。

For example: 例如:

def parse(filepath):
    with open(filepath) as f:
        contents = f.readlines()

    output = {}
    group = {}
    inside_group = False

    for line in contents:
        line = line.strip()
        if line == '':
            continue

        type, value = line.split('=')
        type = type.strip()
        value = value.strip()

        if type == 'OBJECT':
            inside_group = True
        elif type == 'END_OBJECT':
            output[value] = group
            inside_group = False
            group = {}
        elif inside_group:
            group[type] = value

    return output

This should return a dictionary in the form: 这应该以以下形式返回字典:

>>> parse('file1.txt')
{
    "WESTBOUNDINGCOORDINATE": {
        "NUM_VAL": 1,
        "VALUE": 80.829037677094
    },
    "NORTHBOUNDINGCOORDINATE": {
        "NUM_VAL": 1,
        "VALUE": 39.9999999964079
    },
    # etc
}

You can then grab whichever coordinate you need from the dictionary. 然后,您可以从词典中获取所需的任何坐标。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM