简体   繁体   English

Python读取格式化字符串

[英]Python Read Formatted String

I have a file with a number of lines formatted with the following syntax: 我有一个文件,其中包含许多使用以下语法格式化的行:

FIELD      POSITION  DATA TYPE
------------------------------
COOP ID       1-6    Character
LATITUDE     8-15    Real
LONGITUDE   17-25    Real
ELEVATION   27-32    Real
STATE       34-35    Character
NAME        37-66    Character
COMPONENT1  68-73    Character
COMPONENT2  75-80    Character
COMPONENT3  82-87    Character
UTC OFFSET  89-90    Integer

The data is all ASCII-formatted. 数据全部是ASCII格式的。

An example of a line is: 一行的例子是:

011084  31.0581  -87.0547   26.0 AL BREWTON 3 SSE                  ------ ------ ------ +6

My current thought is that I'd like to read the file in a line at a time and somehow have each line broken up into a dictionary so I can refer to the components. 我目前的想法是,我想一次读取一行中的文件,并以某种方式将每行分解为字典,以便我可以参考组件。 Is there some module that does this in Python, or some other clean way? 是否有一些模块在Python中执行此操作,或其他一些干净的方式?

Thanks! 谢谢!

EDIT : You can still use the struct module: 编辑 :您仍然可以使用struct模块:

See the struct module documentation. 请参阅struct module文档。 Looks to me like you want to use struct.unpack() 在我看来你想要使用struct.unpack()

What you want is probably something like: 你想要的可能是这样的:

import struct
with open("filename.txt", "r") as f:
    for line in f:
        (coop_id, lat, lon, elev, state, name, c1, c2, c3, utc_offset
         ) = struct.unpack("6sx8sx9sx6sx2sx30sx6sx6sx6sx2s", line.strip())
        (lat, lon, elev) = map(float, (lat, lon, elev))
        utc_offset = int(utc_offset)

I think I understand from your question/comments what you are looking for. 我想我从你的问题/评论中理解你在寻找什么。 If we assume that Real, Character, and Integer are the only data types, then the following code should work. 如果我们假设Real,Character和Integer是唯一的数据类型,那么以下代码应该可以工作。 (I will also assume that the format file you showed is tab delimited): (我还假设您显示的格式文件是制表符分隔的):

format = {}
types = {"Real":float, "Character":str, "Integer":int}

for line in open("format.txt", "r"):
    values = line.split("\t")
    range = values[1].split("-")
    format[values[0]]={"start":int(range[0])-1, "end":int(range[1])-1, "type":types[values[2]]}

results=[]
for line in open("filename.txt"):
    result={}
    for key in format:
        result[key]=format["type"](line[format["start"]:format["end"]])
    results.append(result)

You should end up with results containing a list of dictionaries where each dictionary is a mapping from key names in the format file to data values in the correct data type. 您应该得到包含字典列表的结果,其中每个字典都是从格式文件中的键名到正确数据类型的数据值的映射。

It seems like you could write a function using strings and slices fairly simply. 看起来你可以相当简单地使用字符串和切片编写一个函数。 string[0:5] would be the first element. string [0:5]将是第一个元素。 Does it need to be extensible, or is it likely a one off? 它是否需要是可扩展的,还是可能是一次性的?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM