简体   繁体   中英

Python reading text files for calculation

I have hundreds of *.txt files that I need to open. Each text file have 4 coordinates (xy):

401 353
574 236
585 260
414 376

I need to read each of them for simple calculation. What I have so far is:

import sys,os

if __name__ == '__main__':
    if len(sys.argv) > 1:
        path = sys.argv[1]
    else:
        path = os.getcwd() + '/'
    try:
        filt = set([".txt", ".TXT"])

        sortlist = []
        sortlist = os.listdir(path)
        sortlist.sort()

        for item in sortlist:
            fileType = item[-4:]
            if fileType in filt:
                CurrentFile = open(item, 'r')
                TextInCurrentFile = CurrentFile.read()
                print TextInCurrentFile     # Printing textfiles content.
    except Exception, e:
        print e

First thing is that it doesn't sort the files correctly. I would prefer it in both numerical and alphabetical number.

But my main concern is how to define define: (X0, Y0, X1, Y1, X2, Y2, X3, Y3)

Would it be possible to read from another file with the same file-name, located in another folder to include in the calculation. I'm going to make some comparison of each file and logging the overall results.

Let's take this problem by steps. The first steps is actually getting the required files in order. I like to use glob module, but if you want your match to be case insensitive you will be better of using re module. Sorting can then be done by sorted function.

import os
import re
import fnmatch

rule = re.compile(fnmatch.translate('*.txt'), re.IGNORECASE) 
print sorted([fname for fname in os.listdir('.') if rule.match(fname)])

Now, because the data format is fixed, you can approach this by simply using a list of namedtuple to contain the data. The code could look something like this:

import os
import re
import fnmatch
import collections

coords_t = collections.namedtuple('coords_t', ['x0', 'y0', 'x1', 'y1', 'x2', 'y2', 'x3', 'y3'])
data_collection = []

rule = re.compile(fnmatch.translate('*.txt'), re.IGNORECASE)
for fname in sorted([name for name in os.listdir('.') if rule.match(name)]):
    with open(fname, 'r') as f:
        data = f.read()
        data_collection.append(coords_t(*data.replace('\n', ' ').split(' ')[:-1]))

print data_collection

Now, you have the data saved as a list of namedtuple in the data_collection variable and you can do the required calculations. Also, it is better to use with context manager to work with files as it handles possible exceptions for you.

It also depends on the resulting format you want to achieve, for example if you wanted to know coordinates associated with a file dictionary would be better choice than list, using

{fname: coords_t(*data.replace('\n', ' ').split(' ')[:-1])}

Usage of namedtuple gives you "nice" access to it's values, using dot notation such as data_collection[0].x0 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM