简体   繁体   中英

how do you combine three lists in python using dict?

I need to read through lines in multiple files; the first value in each line is the runtime, the third is the job id, and the fourth is the status. I have created lists to store each of these values. Now I'm not understanding how to connect all of these lists and sort them based on the lines with the top 20 fastest runtimes. Does anybody have a suggestion for how I can do that? Thank you!

for filePath in glob.glob(os.path.join(path1, '*.gz')):
    with gzip.open(filePath, 'rt', newline="") as file:
        reader = csv.reader(file)
        for line in file:
            for row in reader:
                runTime = row[0]
                ID = row[2]
                eventType = row[3]
                jobList.append(ID)
                timeList.append(runTime)
                eventList.append(eventType)

    jobList = sorted(set(jobList))
    counter = len(jobList)
    print ("There are %s unique jobs." % (counter))
    i = 1
    while i < 21:
        print("#%s\t%s\t%s\t%s" % (i, timeList[i], jobList[i], eventList[i]))
        i = i + 1

Instead of using three different lists, you can use a single list and append tuples to the list..Like so

combinedList.append((runTime, ID, eventType))

You can then sort the combinedList of tuples as shown here: How to sort (list/tuple) of lists/tuples?

You can make more improvements, such as use namedtuples in python etc. Look them up on SO or google

Note: there may be other "efficient" ways to do this. For example use python heapq library and create a heap of size 20 to sort by top 20 run times. You can learn more about them on python's website or Stack overflow but you may need some more algorithmic background

Instead of maintaining three lists jobList , timeList , eventList , you can store (runTime, eventType) tuples in a dictionary, using ID as key, by replacing

jobList = []
timeList = []
eventList = []
…
jobList.append(ID)
timeList.append(runTime)
eventList.append(eventType)

by

jobs = {}  # an empty dictionary
…
jobs[ID] = (runTime, eventType)

To loop over that dictionary sorted by increasing runTime values:

for ID, (runTime, eventType) in sorted(jobs.items(), key=lambda item: item[1][0]):
    # do something with it

Using the python sorted built in would work better for you if you kept runTime , ID , and eventType together in a data structure. I would recommend using a namedtuple , as it allows you to be clear about what you're doing. You can do the following:

from collections import namedtuple
Job = namedtuple("Job", "runtime id event_type")

Then you're code could change to be:

for filePath in glob.glob(os.path.join(path1, '*.gz')):
    with gzip.open(filePath, 'rt', newline="") as file:
        reader = csv.reader(file)
        for line in file:
            for row in reader:
                runTime = row[0]
                ID = row[2]
                eventType = row[3]
                job = Job(runTime, ID, eventType)
                jobs.append(job)

    jobs = sorted(jobs)
    n_jobs = len(jobs)
    print("There are %s unique jobs." % (n_jobs))
    for job in jobs[:20]:
        print("#%s\t%s\t%s\t%s" % (i, job.runtime, job.id, job.event_type))

It's worth noting, this sorting will work properly because by default, tuples are sorted by their first element. If there is a tie, your sort algorithm will move the comparison to the next elements of the tuple.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM