简体   繁体   中英

Parsing a fixed-width file in Python with Big Decimals

I have to parse the following file in python:

20100322;232400;1.355800;1.355900;1.355800;1.355900;0
20100322;232500;1.355800;1.355900;1.355800;1.355900;0
20100322;232600;1.355800;1.355800;1.355800;1.355800;0

I need to end upwith the following variables (first line is parsed as example):

year = 2010
month = 03
day = 22
hour = 23
minute = 24
p1 = Decimal('1.355800')
p2 = Decimal('1.355900')
p3 = Decimal('1.355800')
p4 = Decimal('1.355900')

I have tried:

line = '20100322;232400;1.355800;1.355900;1.355800;1.355900;0'
year = line[:4]
month = line[4:6]
day = line[6:8]
hour = line[9:11]
minute = line[11:13]
p1 = Decimal(line[16:24])
p2 = Decimal(line[25:33])
p3 = Decimal(line[34:42])
p4 = Decimal(line[43:51])

print(year)
print(month)
print(day)
print(hour)
print(minute)
print(p1)
print(p2)
print(p3)
print(p4)

Which works fine, but I am wondering if there is an easier way to parse this (maybe using struct) to avoid having to count each position manually.

from decimal import Decimal
from datetime import datetime

line = "20100322;232400;1.355800;1.355900;1.355800;1.355900;0"

tokens = line.split(";")

dt = datetime.strptime(tokens[0] + tokens[1], "%Y%m%d%H%M%S")
decimals = [Decimal(string) for string in tokens[2:6]]

# datetime objects also have some useful attributes: dt.year, dt.month, etc.
print(dt, *decimals, sep="\n")

Output:

2010-03-22 23:24:00
1.355800
1.355900
1.355800
1.355900

You could use regex:

import re

to_parse = """
20100322;232400;1.355800;1.355900;1.355800;1.355900;0
20100322;232500;1.355800;1.355900;1.355800;1.355900;0
20100322;232600;1.355800;1.355800;1.355800;1.355800;0
"""

stx = re.compile(
    r'(?P<date>(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2}));'
    r'(?P<time>(?P<hour>\d{2})(?P<minute>\d{2})(?P<second>\d{2}));' 
    r'(?P<p1>[\.\-\d]*);(?P<p2>[\.\-\d]*);(?P<p3>[\.\-\d]*);(?P<p4>[\.\-\d]*)'
    )

f = [{k:float(v) if 'p' in k else int(v) for k,v in a.groupdict().items()} for a in stx.finditer(to_parse)]

print(f)

Output:

[{'date': 20100322,
  'day': 22,
  'hour': 23,
  'minute': 24,
  'month': 3,
  'p1': 1.3558,
  'p2': 1.3559,
  'p3': 1.3558,
  'p4': 1.3559,
  'second': 0,
  'time': 232400,
  'year': 2010},
 {'date': 20100322,
  'day': 22,
  'hour': 23,
  'minute': 25,
  'month': 3,
  'p1': 1.3558,
  'p2': 1.3559,
  'p3': 1.3558,
  'p4': 1.3559,
  'second': 0,
  'time': 232500,
  'year': 2010},
 {'date': 20100322,
  'day': 22,
  'hour': 23,
  'minute': 26,
  'month': 3,
  'p1': 1.3558,
  'p2': 1.3558,
  'p3': 1.3558,
  'p4': 1.3558,
  'second': 0,
  'time': 232600,
  'year': 2010}]

Here i stored everything in a list, but you could actually go through the results of finditer line by line if you don't want to store everything in memory.

You can also replace fload and/or int with Decimal if needed

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM