简体   繁体   English

使用Python编写/解析固定宽度的文件

[英]Writing/parsing a fixed width file using Python

I'm a newbie to Python and I'm looking at using it to write some hairy EDI stuff that our supplier requires. 我是Python的新手,我正在使用它来编写供应商需要的一些毛茸茸的EDI。

Basically they need an 80-character fixed width text file, with certain "chunks" of the field with data and others left blank. 基本上他们需要一个80个字符的固定宽度文本文件,该字段的某些“块”包含数据,而其他文件则留空。 I have the documentation so I know what the length of each "chunk" is. 我有文档,所以我知道每个“块”的长度是多少。 The response that I get back is easier to parse since it will already have data and I can use Python's "slices" to extract what I need, but I can't assign to a slice - I tried that already because it sounded like a good solution, and it didn't work since Python strings are immutable :) 我得到的响应更容易解析,因为它已经有数据,我可以使用Python的“切片”来提取我需要的东西,但是我不能分配给切片 - 我已经尝试了,因为它听起来像一个好的解决方案,它不起作用,因为Python字符串是不可变的:)

Like I said I'm really a newbie to Python but I'm excited about learning it :) How would I go about doing this? 就像我说我真的是Python的新手,但我很高兴学习它:)我会怎么做呢? Ideally I'd want to be able to say that range 10-20 is equal to "Foo" and have it be the string "Foo" with 7 additional whitespace characters (assuming said field has a length of 10) and have that be a part of the larger 80-character field, but I'm not sure how to do what I'm thinking. 理想情况下,我希望能够说范围10-20等于“Foo”,并且它是带有7个额外空白字符的字符串“Foo”(假设所述字段的长度为10)并且具有80字符大字体的一部分,但我不知道如何做我正在思考的事情。

You don't need to assign to slices, just build the string using % formatting . 您不需要分配切片,只需使用% formatting来构建字符串。

An example with a fixed format for 3 data items: 具有3个数据项的固定格式的示例:

>>> fmt="%4s%10s%10s"
>>> fmt % (1,"ONE",2)
'   1       ONE         2'
>>> 

Same thing, field width supplied with the data: 同样的事情,数据提供的字段宽度:

>>> fmt2 = "%*s%*s%*s"
>>> fmt2 % (4,1, 10,"ONE", 10,2)
'   1       ONE         2'
>>> 

Separating data and field widths, and using zip() and str.join() tricks: 分离数据和字段宽度,并使用zip()str.join()技巧:

>>> widths=(4,10,10)
>>> items=(1,"ONE",2)
>>> "".join("%*s" % i for i in zip(widths, items))
'   1       ONE         2'
>>> 

Hopefully I understand what you're looking for: some way to conveniently identify each part of the line by a simple variable, but output it padded to the correct width? 希望我理解你在寻找什么:通过一个简单的变量方便地识别线的每个部分,但输出填充到正确的宽度?

The snippet below may give you what you want 下面的代码段可能会为您提供所需内容

class FixWidthFieldLine(object):

    fields = (('foo', 10),
              ('bar', 30),
              ('ooga', 30),
              ('booga', 10))

    def __init__(self):
        self.foo = ''
        self.bar = ''
        self.ooga = ''
        self.booga = ''

    def __str__(self):
        return ''.join([getattr(self, field_name).ljust(width) 
                        for field_name, width in self.fields])

f = FixWidthFieldLine()
f.foo = 'hi'
f.bar = 'joe'
f.ooga = 'howya'
f.booga = 'doin?'

print f

This yields: 这会产生:

hi        joe                           howya                         doing     

It works by storing a class-level variable, fields which records the order in which each field should appear in the output, together with the number of columns that field should have. 它的工作原理是存储一个类级变量,这些fields记录每个字段在输出中应出现的顺序,以及字段应具有的列数。 There are correspondingly-named instance variables in the __init__ that are set to an empty string initially. __init__中有相应命名的实例变量,最初设置为空字符串。

The __str__ method outputs these values as a string. __str__方法将这些值输出为字符串。 It uses a list comprehension over the class-level fields attribute, looking up the instance value for each field by name, and then left-justifying it's output according to the columns. 它使用对类级别fields属性的列表理解,按名称查找每个字段的实例值,然后根据列左对齐它的输出。 The resulting list of fields is then joined together by an empty string. 然后,生成的字段列表由空字符串连接在一起。

Note this doesn't parse input, though you could easily override the constructor to take a string and parse the columns according to the field and field widths in fields . 请注意,这不解析输入,尽管你可以很容易地重写构造函数获得一个字符串,并根据现场和字段宽度解析列fields It also doesn't check for instance values that are longer than their allotted width. 它也不会检查长度超过其分配宽度的实例值。

您可以使用对齐函数左对齐,右对齐并将字符串置于给定宽度的字段中。

'hi'.ljust(10) -> 'hi        '

I know this thread is quite old, but we use a library called django-copybook . 我知道这个线程已经很老了,但是我们使用了一个名为django-copybook的库。 It has nothing to do with django (anymore). 它与django(不再)无关。 We use it to go between fixed width cobol files and python. 我们用它来介于固定宽度的cobol文件和python之间。 You create a class to define your fixed width record layout and can easy move between typed python objects and fixed width files: 您创建一个类来定义固定宽度记录布局,并可以轻松地在键入的python对象和固定宽度文件之间移动:

USAGE:
class Person(Record):
    first_name = fields.StringField(length=20)
    last_name = fields.StringField(length=30)
    siblings = fields.IntegerField(length=2)
    birth_date = fields.DateField(length=10, format="%Y-%m-%d")

>>> fixedwidth_record = 'Joe                 Smith                         031982-09-11'
>>> person = Person.from_record(fixedwidth_record)
>>> person.first_name
'Joe'
>>> person.last_name
'Smith'
>>> person.siblings
3
>>> person.birth_date
datetime.date(1982, 9, 11)

It can also handle situations similar to Cobol's OCCURS functionality like when a particular section is repeated X times 它还可以处理类似于Cobol的OCCURS功能的情况,例如当特定部分重复X次时

I used Jarret Hardie's example and modified it slightly. 我使用了Jarret Hardie的例子并略微修改了它。 This allows for selection of type of text alignment(left, right or centered.) 这允许选择文本对齐类型(左,右或居中)。

class FixedWidthFieldLine(object):
    def __init__(self, fields, justify = 'L'):
        """ Returns line from list containing tuples of field values and lengths. Accepts
            justification parameter.
            FixedWidthFieldLine(fields[, justify])

            fields = [(value, fieldLenght)[, ...]]
        """
        self.fields = fields

        if (justify in ('L','C','R')):
            self.justify = justify
        else:
            self.justify = 'L'

    def __str__(self):
        if(self.justify == 'L'):
            return ''.join([field[0].ljust(field[1]) for field in self.fields])
        elif(self.justify == 'R'):
            return ''.join([field[0].rjust(field[1]) for field in self.fields])
        elif(self.justify == 'C'):
            return ''.join([field[0].center(field[1]) for field in self.fields])

fieldTest = [('Alex', 10),
         ('Programmer', 20),
         ('Salem, OR', 15)]

f = FixedWidthFieldLine(fieldTest)
print f
f = FixedWidthFieldLine(fieldTest,'R')
print f

Returns: 返回:

Alex      Programmer          Salem, OR      
      Alex          Programmer      Salem, OR

It's a little difficult to parse your question, but I'm gathering that you are receiving a file or file-like-object, reading it, and replacing some of the values with some business logic results. 解析你的问题有点困难,但我收集到你正在接收一个文件或类文件对象,阅读它,并用一些业务逻辑结果替换一些值。 Is this correct? 这个对吗?

The simplest way to overcome string immutability is to write a new string: 克服字符串不变性的最简单方法是编写一个新字符串:

# Won't work:
test_string[3:6] = "foo"

# Will work:
test_string = test_string[:3] + "foo" + test_string[6:]

Having said that, it sounds like it's important to you that you do something with this string, but I'm not sure exactly what that is. 话虽如此,听起来你对这个字符串做一些事情很重要,但我不确定那是什么。 Are you writing it back to an output file, trying to edit a file in place, or something else? 您是否将其写回输出文件,尝试编辑文件或其他内容? I bring this up because the act of creating a new string (which happens to have the same variable name as the old string) should emphasize the necessity of performing an explicit write operation after the transformation. 我提出这个问题是因为创建一个新字符串(恰好具有与旧字符串相同的变量名)的行为应该强调在转换后执行显式写操作的必要性。

You can convert the string to a list and do the slice manipulation. 您可以将字符串转换为列表并执行切片操作。

>>> text = list("some text")
>>> text[0:4] = list("fine")
>>> text
['f', 'i', 'n', 'e', ' ', 't', 'e', 'x', 't']
>>> text[0:4] = list("all")
>>> text
['a', 'l', 'l', ' ', 't', 'e', 'x', 't']
>>> import string
>>> string.join(text, "")
'all text'

It is easy to write function to "modify" string. 编写函数来“修改”字符串很容易。

def change(string, start, end, what):
    length = end - start
    if len(what)<length: what = what + " "*(length-len(what))
    return string[0:start]+what[0:length]+string[end:]

Usage: 用法:

test_string = 'This is test string'

print test_string[5:7]  
# is
test_string = change(test_string, 5, 7, 'IS')
# This IS test string
test_string = change(test_string, 8, 12, 'X')
# This IS X    string
test_string = change(test_string, 8, 12, 'XXXXXXXXXXXX')
# This IS XXXX string

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM