简体   繁体   中英

Python regex pattern match starts with dot and store it in dict format

#-----------------------------------------------------------------------------------    
from pprint import pprint

data = '''
. 
.
.
#Long log file
 -------------------------------------------------------------------------------
 Section Name                   | Budget    | Size      | Prev Size | Overflow
 --------------------------------+-----------+-----------+-----------+----------
  .text.resident                 |    712924 |    794576 |    832688 | YES
  .rodata.resident               |     77824 |     77560 |     21496 | YES
  .data.resident                 |     28672 |     28660 |     42308 | NO
  .bss.resident                  |     52672 |   1051632 |   1455728 | YES 
  
.
.
.
  
'''

Output expected:

MEMDICT = {'.text.resident' : {'Budget':'712924', 'Size':'794576', 'Prev Size': '832688' , 'Overflow': 'YES'},
           '.rodata.resident' : {'Budget':'', 'Size':'', 'Prev Size': '' , 'Overflow': 'YES'},
           '.data.resident' :{'Budget':'', 'Size':'', 'Prev Size': '' , 'Overflow': 'NO'},
           '.bss.resident' :{'Budget':'', 'Size':'', 'Prev Size': '' , 'Overflow': 'YES'}}

I am a beginer in python. Please suggest some simple steps

Logic:

  • Search for a regex pattern and get the headers in a list
pattern = re.compile(r'\sSection Name\s|\sBudget*') # This can be improved, 
if(pattern.match(line)):
   key_list = (''.join(line.split())).split('|') # Unable to handle space issues, so trimmed and used.
  • Search for a regex pattern to match .something.resident | \\d+ | \\d+ | \\d+ | ** Need some help and get it in value_list

  • Making all list into the dict in a loop

mem_info = {} # reset the list
for i in range(0,len(key_list)):
    mem_info[key_list[i]] = value_list[i]
    MEMDICT[sta_info[0]] = sta_info 

The only thing you haven't shown us is what line ends the section. Other than that, this is what you need:

keeper = False
memdict = {}
for line in open(file):
    if not keeper:
        if 'Section Name' in line:
            keeper = True
        continue
    if '-------------------' in line:
        continue
    if 'whatever ends the section' in line:
        break
    parts = line.split()
    memdict[parts[0]] = {
        'Budget': int(parts[1]),
        'Size': int(parts[2]),
        'Prev Size': int(parts[3]),
        'Overflow': parts[4]
    )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM