如何使以下python程序（代码）更高效？

Question

Any efficient way to solve the following problem assuming data is large. 假设数据很大，则解决以下问题的任何有效方法。 I solved the problem but how can I improve the code, which will make it efficient. 我解决了问题，但是如何改进代码，这将使其变得高效。 any suggestions? 有什么建议么？

Data: 数据：

movie_sub_themes = {
'Epic': ['Ben Hur', 'Gone With the Wind', 'Lawrence of Arabia'],
'Spy': ['James Bond', 'Salt', 'Mission: Impossible'],
'Superhero': ['The Dark Knight Trilogy', 'Hancock, Superman'],
'Gangster': ['Gangs of New York', 'City of God', 'Reservoir Dogs'],
'Fairy Tale': ['Maleficent', 'Into the Woods', 'Jack the Giant Killer'],
'Romantic':['Casablanca', 'The English Patient', 'A Walk to Remember'],
'Epic Fantasy': ['Lord of the Rings', 'Chronicles of Narnia', 'Beowulf']}

movie_themes = {
'Action': ['Epic', 'Spy', 'Superhero'],
'Crime' : ['Gangster'],
'Fantasy' : ['Fairy Tale', 'Epic Fantasy'],
'Romance' : ['Romantic']}

themes_keys = movie_themes.keys()
theme_movies_keys = movie_sub_themes.keys()

#Iterate in movie_themes
#Check movie_themes keys in movie_sub_keys
#if yes append the movie_sub_keys into the newdict
newdict = {}
for i in range(len(themes_keys)):
   a = []
   for j in range(len(movie_themes[themes_keys[i]])):
     try:
         if movie_themes[themes_keys[i]][j] in theme_movies_keys:
            a.append(movie_sub_themes[movie_themes[themes_keys[i]][j]])
     except:
         pass
   newdict[themes_keys[i]] = a

# newdict contains nested lists
# Program to unpack the nested list into single list
# Storing the value into theme_movies_data 
theme_movies_data = {}
for k, v in newdict.iteritems():
    mylist_n = [j for i in v for j in i]
    theme_movies_data[k] = dict.fromkeys(mylist_n).keys()

print (theme_movies_data)

Output: 输出：

{'Action': ['Gone With the Wind', 'Ben Hur','Hancock, Superman','Mission: Impossible','James Bond','Lawrence of Arabia','Salt','The Dark Knight Trilogy'],
 'Crime': ['City of God', 'Reservoir Dogs', 'Gangs of New York'],
 'Fantasy': ['Jack the Giant Killer','Beowulf','Into the Woods','Maleficent','Lord of the Rings','Chronicles of Narnia'],
 'Romance': ['The English Patient', 'A Walk to Remember', 'Casablanca']}

Apologies for not properly commenting the code. 抱歉，无法正确注释代码。

I am more concern about the running time. 我更担心跑步时间。

Thank you.. 谢谢..

Answer 1

You could use a relational database to store two tables, one of movies and their sub-theme and one relating sub-themes to movie themes. 您可以使用关系数据库存储两个表，一个表是电影及其子主题，另一个表将子主题与电影主题相关联。 You could then use SQL to query the database, selecting a list of all movies and their associated movie themes. 然后，您可以使用SQL查询数据库，选择所有电影及其相关电影主题的列表。

This approach would be more efficient, as SQL commands tend to be compiled for speed of processing. 这种方法将更加有效，因为倾向于为了提高处理速度而编译SQL命令。 The relational database model is very scalable, and so will work for very large datasets with minimal overhead. 关系数据库模型具有很好的可伸缩性，因此可以以最小的开销处理非常大的数据集。

For an example of creating and using a simple database in Python, see here . 有关在Python中创建和使用简单数据库的示例，请参见此处。 If you are not familiar with SQL operations, see here for a simple tutorial on the useful operations. 如果您不熟悉SQL操作，请参见此处以获取有关有用操作的简单教程。

Answer 2

Here's my solution (using defaultdict): 这是我的解决方案（使用defaultdict）：

movie_sub_themes = {
'Epic': ['Ben Hur', 'Gone With the Wind', 'Lawrence of Arabia'],
'Spy': ['James Bond', 'Salt', 'Mission: Impossible'],
'Superhero': ['The Dark Knight Trilogy', 'Hancock, Superman'],
'Gangster': ['Gangs of New York', 'City of God', 'Reservoir Dogs'],
'Fairy Tale': ['Maleficent', 'Into the Woods', 'Jack the Giant Killer'],
'Romantic':['Casablanca', 'The English Patient', 'A Walk to Remember'],
'Epic Fantasy': ['Lord of the Rings', 'Chronicles of Narnia', 'Beowulf']}

movie_themes = {
'Action': ['Epic', 'Spy', 'Superhero'],
'Crime' : ['Gangster'],
'Fantasy' : ['Fairy Tale', 'Epic Fantasy'],
'Romance' : ['Romantic']}

from collections import defaultdict
newdict = defaultdict(list)

for theme, sub_themes_list in movie_themes.items():
    for sub_theme in sub_themes_list:
        newdict[theme] += movie_sub_themes.get(sub_theme, [])       

dict(newdict)

>> {'Action': ['Ben Hur',
  'Gone With the Wind',
  'Lawrence of Arabia',
  'James Bond',
  'Salt',
  'Mission: Impossible',
  'The Dark Knight Trilogy',
  'Hancock, Superman'],
 'Crime': ['Gangs of New York', 'City of God', 'Reservoir Dogs'],
 'Fantasy': ['Maleficent',
  'Into the Woods',
  'Jack the Giant Killer',
  'Lord of the Rings',
  'Chronicles of Narnia',
  'Beowulf'],
 'Romance': ['Casablanca', 'The English Patient', 'A Walk to Remember']}

timings: 4.84 µs vs 14.6 µs 时间：4.84 µs与14.6 µs

如何使以下python程序（代码）更高效？

问题描述

2 个解决方案

解决方案1
0 2018-05-22 16:31:36

解决方案2
0 2018-05-22 16:45:03

如何使以下python程序（代码）更高效？

问题描述

2 个解决方案

解决方案1 0 2018-05-22 16:31:36

解决方案2 0 2018-05-22 16:45:03

解决方案1
0 2018-05-22 16:31:36

解决方案2
0 2018-05-22 16:45:03