简体   繁体   English

Python读取文本文件进行计算

[英]Python reading text files for calculation

I have hundreds of *.txt files that I need to open. 我有数百个需要打开的* .txt文件。 Each text file have 4 coordinates (xy): 每个文本文件都有4个坐标(xy):

401 353
574 236
585 260
414 376

I need to read each of them for simple calculation. 我需要阅读它们中的每一个,以便进行简单的计算。 What I have so far is: 到目前为止,我有:

import sys,os

if __name__ == '__main__':
    if len(sys.argv) > 1:
        path = sys.argv[1]
    else:
        path = os.getcwd() + '/'
    try:
        filt = set([".txt", ".TXT"])

        sortlist = []
        sortlist = os.listdir(path)
        sortlist.sort()

        for item in sortlist:
            fileType = item[-4:]
            if fileType in filt:
                CurrentFile = open(item, 'r')
                TextInCurrentFile = CurrentFile.read()
                print TextInCurrentFile     # Printing textfiles content.
    except Exception, e:
        print e

First thing is that it doesn't sort the files correctly. 第一件事是它不能正确排序文件。 I would prefer it in both numerical and alphabetical number. 我更喜欢数字和字母数字。

But my main concern is how to define define: (X0, Y0, X1, Y1, X2, Y2, X3, Y3) 但是我主要关心的是如何定义定义:(X0,Y0,X1,Y1,X2,Y2,X3,Y3)

Would it be possible to read from another file with the same file-name, located in another folder to include in the calculation. 是否可以从另一个文件中读取的具有相同文件名的另一个文件进行读取,该文件位于另一个文件夹中。 I'm going to make some comparison of each file and logging the overall results. 我将对每个文件进行一些比较,并记录总体结果。

Let's take this problem by steps. 让我们逐步解决这个问题。 The first steps is actually getting the required files in order. 第一步实际上是按顺序获取所需文件。 I like to use glob module, but if you want your match to be case insensitive you will be better of using re module. 我喜欢使用glob模块,但是如果您希望匹配不区分大小写,则最好使用re模块。 Sorting can then be done by sorted function. 然后可以通过排序功能进行排序。

import os
import re
import fnmatch

rule = re.compile(fnmatch.translate('*.txt'), re.IGNORECASE) 
print sorted([fname for fname in os.listdir('.') if rule.match(fname)])

Now, because the data format is fixed, you can approach this by simply using a list of namedtuple to contain the data. 现在,由于数据格式是固定的,因此您可以通过简单地使用namedtuple列表来包含数据来解决此问题。 The code could look something like this: 代码看起来像这样:

import os
import re
import fnmatch
import collections

coords_t = collections.namedtuple('coords_t', ['x0', 'y0', 'x1', 'y1', 'x2', 'y2', 'x3', 'y3'])
data_collection = []

rule = re.compile(fnmatch.translate('*.txt'), re.IGNORECASE)
for fname in sorted([name for name in os.listdir('.') if rule.match(name)]):
    with open(fname, 'r') as f:
        data = f.read()
        data_collection.append(coords_t(*data.replace('\n', ' ').split(' ')[:-1]))

print data_collection

Now, you have the data saved as a list of namedtuple in the data_collection variable and you can do the required calculations. 现在,您已将数据另存为data_collection变量中的namedtuple列表,并且可以执行所需的计算。 Also, it is better to use with context manager to work with files as it handles possible exceptions for you. 另外,最好with上下文管理器一起使用来处理文件,因为它可以为您处理可能的异常。

It also depends on the resulting format you want to achieve, for example if you wanted to know coordinates associated with a file dictionary would be better choice than list, using 它还取决于您要实现的结果格式,例如,如果您想知道与文件字典关联的坐标比使用列表更好的选择,请使用

{fname: coords_t(*data.replace('\n', ' ').split(' ')[:-1])}

Usage of namedtuple gives you "nice" access to it's values, using dot notation such as data_collection[0].x0 . 使用data_collection[0].x0使用点表示法(例如data_collection[0].x0 “很好”地访问其值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM