简体   繁体   中英

Rename '.tbl' files in directory using string from the first line of file python

I have a directory filled with '.tbl' files. The file set up is as follows:

\STAR_ID = "HD 74156"

\DATA_CATEGORY = "Planet Radial Velocity Curve"

\NUMBER_OF_POINTS = "82"

\TIME_REFERENCE_FRAME = "JD"

\MINIMUM_DATE = "2453342.23249"

\DATE_UNITS = "days"

\MAXIMUM_DATE = "2454231.60002"

....

I need to rename every file in the directory using the STAR_ID,

so in this case the files name would be 'HD 74156.tbl.'

I have been able to do it for about 20 of the ~600 files.

I am not sure why it will not continue through the rest of the files.

My current code is:

 for i in os.listdir(path): with open(i) as f: first_line = f.readline() system = first_line.split('"')[1] new_file = system + ".tbl" os.rename(file, new_file)`

and the error message is:

 --------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-37-5883c060a977> in <module> 3 with open(i) as f: 4 first_line = f.readline() ----> 5 system = first_line.split('"')[1] 6 new_file = system + ".tbl" 7 os.rename(file, new_file) IndexError: list index out of range

This error occurs because of first_line.split('"') is returning a list with less of 2 items.
you can try

first_line_ls = first_line.split('"')
if len(first_line_ls) > 1:
   system = first_line_ls[1]
else:
    #other method

This code can help you prevent the error and handle cases the file_line str have less then 2 "

It looks like these .tbl files are not as uniform as you might have hoped. If this line:

----> 5         system = first_line.split('"')[1]

fails on some files, it's because their first line is not formatted as you expected, as @Leo Arad noted. You also want to make sure you're actually using the STAR_ID field. Perhaps these files usually put all the fields in the same order (as an aside, what are these .tbl files? What software did they come from? I've never seen it before), but since you've already found other inconsistencies with the format, better to be safe than sorry.

I might write a little helper function to parse the fields in this file. It takes a single line and returns a (key, value) tuple for the field. If the line does not look like a valid field it returns (None, None) :

import re

# Dissection of this regular expression:
# ^\\ : line begins with \
# (?P<key>\w+) : extract the key, which is one or more letters, numbers or underscores
# \s*=\s* : an equal sign surrounding by any amount of white space
# "(?P<value>[^"]*)" : extract the value, which is between a pair of double-quotes
#                      and contains any characters other than double-quotes
# (Note: I don't know if this file format has a mechanism for escaping
# double-quotes inside the value; if so that would have to be handled as well)
_field_re = re.compile(r'^\\(?P<key>\w+)\s*=\s*"(?P<value>[^"]*)"')

def parse_field(line):
    # match the line against the regular expression
    match = _field_re.match(line)
    # if it doesn't match, return (None, None)
    if match is None:
        return (None, None)
    else:
        # return the key and value pair
        return match.groups()

Now open your file, loop over all the lines, and perform the rename once you find STAR_ID . If not, print a warning (this is mostly the same as your code with some slight variations):

for filename in os.listdir(path):
    filename = os.path.join(path, filename)
    star_id = None

    # NOTE: Do the rename outside the with statement so that the
    # file is closed; on Linux it doesn't matter but on Windows
    # the rename will fail if the file is not closed first
    with open(filename) as fobj:
        for line in fobj:
            key, value = parse_field(line)
            if key == 'STAR_ID':
                star_id = value
                break


    if star_id is not None:
        os.rename(filename, os.path.join(path, star_id + '.tbl'))
    else:
        print(f'WARNING: STAR_ID key missing from {filename}', file=sys.stderr)

If you are not comfortable with regular expressions (and really, who is?) it would be good to learn the basics as it's an extremely useful tool to have in your belt. However, this format is simple enough that you could get away with using simple string parsing methods like you were doing. Though I would still enhance it a bit to make sure you're actually getting the STAR_ID field. Something like this:

def parse_field(line):
    if '=' not in line:
        return (None, None)

    key, value = [part.strip() for part in line.split('=', 1)]

    if key[0] != '\\':
        return (None, None)
    else:
        key = key[1:]

    if value[0] != '"' or value[-1] != '"':
        # still not a valid line assuming quotes are required
        return (None, None)
    else:
        return (key, value.split('"')[1])

This is similar to what you were doing, but a little more robust (and returns the key as well as the value). But you can see this is more involved than the regular expression version. It's actually more-or-less implementing the exact same logic as the regular expression, but more slowly and verbosely.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM