简体   繁体   English

合并元组列表中的元素?

[英]Combine elements in list of Tuples?

I'm working on a program that takes in an imdb text file, and outputs the top actors (by movie appearances) based on the user input N. 我正在研究一个程序,该程序接收一个imdb文本文件,并根据用户输入N输出最佳演员(按电影出场)。

However, I'm running into an issue where I'm having slots taken up by actors in the same amount of movies, which I need to avoid. 但是,我遇到了一个问题,就是演员在相同数量的电影中占用的位置,这是我需要避免的。 Rather, if two actors are in 5 movies, for example, the number 5 should appear and the actors names should be combined , separated by a semicolon. 相反,例如,如果两个演员在5部电影中,则应出现数字5,并且演员名称应组合起来,并以分号分隔。

I've tried multiple workarounds to this and nothing has yet worked. 我已经尝试了多种解决方法,但仍然没有任何效果。 Any suggestions? 有什么建议么?

if __name__ == "__main__":
    imdb_file = raw_input("Enter the name of the IMDB file ==> ").strip()
    print imdb_file
    N= input('Enter the number of top individuals ==> ')
    print N


    actors_to_movies = {}

    for line in open(imdb_file):
        words = line.strip().split('|')
        actor = words[0].strip()
        movie = words[1].strip()
        if not actor in actors_to_movies:
            actors_to_movies[actor] = set()
        actors_to_movies[actor].add(movie)

    movie_list= sorted(list(actors_to_movies[actor])) 

    #Arranges Dictionary into List of Tuples#
    D = [ (x, actors_to_movies[x]) for x in actors_to_movies]
    descending = sorted(D, key = lambda x: len(x[1]), reverse=True)

    #Prints Tuples in Descending Order N number of times (User Input)#
    for i in range(N):
        print str(len(descending[i][1]))+':', descending[i][0]

There is a useful method itertools.groupby 有一个有用的方法itertools.groupby

It allows you to split list into the groups by some key. 它允许您通过某个键将列表分为几组。 Using it you can easily write a function that prints top actors: 使用它,您可以轻松地编写一个打印顶级演员的函数:

import itertools
def print_top_actors(actor_info_list, top=5):
    """
    :param: actor_info_list should contain tuples of (actor_name, movie_count)
    """
    actor_info_list.sort(key=lambda x: x[1], reverse=True)
    for i, (movie_count, actor_iter) in enumerate(itertools.groupby(actor_info_list)):
        if i >= top:
            break
        print movie_count, ';'.join(actor for actor, movie_count in actor_iter)

and example of usage: 用法示例:

>>> print_top_actors(
...     [
...         ("DiCaprio", 100500),
...         ("Pitt", 100500),
...         ("foo", 10),
...         ("bar", 10),
...         ("baz", 10),
...         ("qux", 3),
...         ("lol", 1)
...     ], top = 3)
100500 DiCaprio;Pitt
10 foo;bar;baz
3 qux

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM