简体   繁体   中英

extract specific text from several metadata files using python

How to extract WESTBOUNDINGCOORDINATE, NORTHBOUNDINGCOORDINATE, EASTBOUNDINGCOORDINATE, and SOUTHBOUNDINGCOORDINATE from the text below? However, all metafiles do not have the texts in the same line, for example, fine one has
WESTBOUNDINGCOORDINATE in line 2 but file two has it in line 4. Please help...

    GROUP                  = BOUNDINGRECTANGLE              

    OBJECT                 = WESTBOUNDINGCOORDINATE             
      NUM_VAL              = 1              
      VALUE                = 80.8290376770946               
    END_OBJECT             = WESTBOUNDINGCOORDINATE             

    OBJECT                 = NORTHBOUNDINGCOORDINATE                
      NUM_VAL              = 1              
      VALUE                = 39.9999999964079               
    END_OBJECT             = NORTHBOUNDINGCOORDINATE                

    OBJECT                 = EASTBOUNDINGCOORDINATE             
      NUM_VAL              = 1              
      VALUE                = 104.443461525786               
    END_OBJECT             = EASTBOUNDINGCOORDINATE             

    OBJECT                 = SOUTHBOUNDINGCOORDINATE                
      NUM_VAL              = 1              
      VALUE                = 29.9999999973059               
    END_OBJECT             = SOUTHBOUNDINGCOORDINATE                

  END_GROUP              = BOUNDINGRECTANGLE

My code:

metafiles = glob.glob("D://*.txt")
for f in metafiles:
   with open (f, 'r') as infile:
      lines = infile.readlines()
      WESTBOUNDINGCOORDINATE = lines[4][29:45]
      print (WESTBOUNDINGCOORDINATE)

The problem is that WESTBOUNDINGCOORDINATE value is not always in the same line.

Try iterating through the file, ignoring all empty lines, and looking for lines which begin with the string "OBJECT" and end with the coordinate you want.

For example:

def parse(filepath):
    with open(filepath) as f:
        contents = f.readlines()

    output = {}
    group = {}
    inside_group = False

    for line in contents:
        line = line.strip()
        if line == '':
            continue

        type, value = line.split('=')
        type = type.strip()
        value = value.strip()

        if type == 'OBJECT':
            inside_group = True
        elif type == 'END_OBJECT':
            output[value] = group
            inside_group = False
            group = {}
        elif inside_group:
            group[type] = value

    return output

This should return a dictionary in the form:

>>> parse('file1.txt')
{
    "WESTBOUNDINGCOORDINATE": {
        "NUM_VAL": 1,
        "VALUE": 80.829037677094
    },
    "NORTHBOUNDINGCOORDINATE": {
        "NUM_VAL": 1,
        "VALUE": 39.9999999964079
    },
    # etc
}

You can then grab whichever coordinate you need from the dictionary.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM