简体   繁体   English

如何使用dict在python中合并三个列表?

[英]how do you combine three lists in python using dict?

I need to read through lines in multiple files; 我需要通读多个文件中的行; the first value in each line is the runtime, the third is the job id, and the fourth is the status. 每行中的第一个值是运行时,第三个是作业ID,第四个是状态。 I have created lists to store each of these values. 我已经创建了存储这些值的列表。 Now I'm not understanding how to connect all of these lists and sort them based on the lines with the top 20 fastest runtimes. 现在,我不了解如何连接所有这些列表,以及如何根据运行时间最快的20个行对它们进行排序。 Does anybody have a suggestion for how I can do that? 有人对我该怎么做有建议吗? Thank you! 谢谢!

for filePath in glob.glob(os.path.join(path1, '*.gz')):
    with gzip.open(filePath, 'rt', newline="") as file:
        reader = csv.reader(file)
        for line in file:
            for row in reader:
                runTime = row[0]
                ID = row[2]
                eventType = row[3]
                jobList.append(ID)
                timeList.append(runTime)
                eventList.append(eventType)

    jobList = sorted(set(jobList))
    counter = len(jobList)
    print ("There are %s unique jobs." % (counter))
    i = 1
    while i < 21:
        print("#%s\t%s\t%s\t%s" % (i, timeList[i], jobList[i], eventList[i]))
        i = i + 1

Instead of using three different lists, you can use a single list and append tuples to the list..Like so 除了使用三个不同的列表,您还可以使用一个列表并将元组追加到列表中。

combinedList.append((runTime, ID, eventType))

You can then sort the combinedList of tuples as shown here: How to sort (list/tuple) of lists/tuples? 然后,可以对元组的combinedList进行排序,如下所示: 如何对列表/元组进行排序(列表/元组)?

You can make more improvements, such as use namedtuples in python etc. Look them up on SO or google 您可以进行更多改进,例如在python中使用namedtuples等。在SO或google上查找它们

Note: there may be other "efficient" ways to do this. 注意:可能还有其他“有效”的方法可以做到这一点。 For example use python heapq library and create a heap of size 20 to sort by top 20 run times. 例如,使用python heapq库并创建大小为20的堆以按前20个运行时间排序。 You can learn more about them on python's website or Stack overflow but you may need some more algorithmic background 您可以在python的网站或Stack Overflow上了解有关它们的更多信息,但可能需要更多算法背景

Instead of maintaining three lists jobList , timeList , eventList , you can store (runTime, eventType) tuples in a dictionary, using ID as key, by replacing 无需维护三个列表jobListtimeListeventList ,您可以将ID用作键,将(runTime, eventType)元组存储在字典中,方法是替换

jobList = []
timeList = []
eventList = []
…
jobList.append(ID)
timeList.append(runTime)
eventList.append(eventType)

by 通过

jobs = {}  # an empty dictionary
…
jobs[ID] = (runTime, eventType)

To loop over that dictionary sorted by increasing runTime values: 要循环通过增加runTime值对字典进行排序:

for ID, (runTime, eventType) in sorted(jobs.items(), key=lambda item: item[1][0]):
    # do something with it

Using the python sorted built in would work better for you if you kept runTime , ID , and eventType together in a data structure. 使用Python sorted建会为您提供更好,如果你不停地runTimeIDeventType在数据结构在一起。 I would recommend using a namedtuple , as it allows you to be clear about what you're doing. 我建议使用namedtuple ,因为它可以让您清楚自己在做什么。 You can do the following: 您可以执行以下操作:

from collections import namedtuple
Job = namedtuple("Job", "runtime id event_type")

Then you're code could change to be: 然后您的代码可能会更改为:

for filePath in glob.glob(os.path.join(path1, '*.gz')):
    with gzip.open(filePath, 'rt', newline="") as file:
        reader = csv.reader(file)
        for line in file:
            for row in reader:
                runTime = row[0]
                ID = row[2]
                eventType = row[3]
                job = Job(runTime, ID, eventType)
                jobs.append(job)

    jobs = sorted(jobs)
    n_jobs = len(jobs)
    print("There are %s unique jobs." % (n_jobs))
    for job in jobs[:20]:
        print("#%s\t%s\t%s\t%s" % (i, job.runtime, job.id, job.event_type))

It's worth noting, this sorting will work properly because by default, tuples are sorted by their first element. 值得注意的是,这种排序将正常工作,因为默认情况下,元组按其第一个元素进行排序。 If there is a tie, your sort algorithm will move the comparison to the next elements of the tuple. 如果存在平局,则您的排序算法会将比较结果移至元组的下一个元素。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM