简体   繁体   English

Python:为什么解压字典时不需要 2 个变量?

[英]Python: Why do I not need 2 variables when unpacking a dictionary?

movie_dataset = {'Avatar': [0.01940156245995175, 0.4812286689419795, 0.9213483146067416], "Pirates of the Caribbean: At World's End": [0.02455894456664483, 0.45051194539249145, 0.898876404494382], 'Spectre': [0.02005646812429373, 0.378839590443686, 0.9887640449438202], ... }

movie_ratings = {'Avatar': 7.9, "Pirates of the Caribbean: At World's End": 7.1, 'Spectre': 6.8, ...}

def distance(movie1, movie2):
  squared_difference = 0
  for i in range(len(movie1)):
    squared_difference += (movie1[i] - movie2[i]) ** 2
  final_distance = squared_difference ** 0.5
  return final_distance

def predict(unknown, dataset, movie_ratings, k):
  distances = []
  #Looping through all points in the dataset
  for title in dataset:
    movie = dataset[title]
    distance_to_point = distance(movie, unknown)
    #Adding the distance and point associated with that distance
    distances.append([distance_to_point, title])
  distances.sort()
  #Taking only the k closest points
  neighbors = distances[0:k]
  total_rating = 0
  for i in neighbors[1]:
    total_rating += movie_ratings[i]  <----- Why is this an error?
  return total_rating / len(neighbors) <----- Why can I not divide by total rating
  #total_rating = 0
  #for i in neighbors:
    # title = neighbors[1]
    #total_rating += movie_ratings[title]  <----- Why is this not an error?
  #return total_rating / len(neighbors)

print(movie_dataset["Life of Pi"])
print(movie_ratings["Life of Pi"])
print(predict([0.016, 0.300, 1.022], movie_dataset, movie_ratings, 5))

Two questions here.这里有两个问题。 First, why is this an error?首先,为什么这是一个错误?

for i in neighbors[1]:

    total_rating += movie_ratings[i]

It seems to be the same as它似乎与

for i in neighbors:
    title = neighbors[1]
    total_rating += movie_ratings[title]

Second, why can I not divide by len(total_rating)?第二,为什么不能除以 len(total_rating)?

Second question first, because it's more straightforward:首先是第二个问题,因为它更简单:

Second, why can I not divide by len(total_rating)?第二,为什么不能除以 len(total_rating)?

You're trying to compute an average, right?您正在尝试计算平均值,对吗? So you want the sum of the ratings divided by the number of ratings?所以你想要收视率的总和除以收视率的数量吗?

Okay.好的。 So, you're trying to figure out how many ratings there are.所以,你试图弄清楚有多少评级。 What's the rule that tells you that?是什么规则告诉你的? It seems like you're expecting to count up the ratings from where they are stored.您似乎希望从存储它们的位置计算评级。 Where are they stored?它们存储在哪里? It is not total_rating ;不是total_rating that's where you stored the numerical sum.那是您存储数字总和的地方。 Where did the ratings come from?收视率从何而来? They came from looking up the names of movies in the movie_ratings .它们来自于在movie_ratings中查找电影的名称。 So the ratings were not actually stored at all;所以评级实际上根本没有存储; there is nothing to measure the len of.没有什么可以len的。 Right?正确的? Well, not quite.嗯,不完全是。 What is the rule that determines the ratings we are adding up?决定我们加起来的评分的规则是什么? We are looking them up in the movie_ratings by title .我们正在按 titlemovie_ratings中查找它们。 So how many of them are there?那么它们有多少呢? As many as there are titles.有多少标题就多少。 Where were the titles stored?标题存储在哪里? They were paired up with distances in the neighbors .他们与neighbors的距离配对。 So there are as many titles as there are neighbors (whatever "neighbor" is supposed to mean here; I don't really understand why you called it that).所以有多少邻居就有多少头衔(“邻居”在这里应该是什么意思;我真的不明白你为什么这么称呼它)。 So that is what you want the len() of.所以就是你想要的len()

Onward to fixing the summation.继续确定总和。

total_rating = 0
for i in neighbors[1]:
    total_rating += movie_ratings[i]

First, this computes neighbors[1] , which will be one of the [distance_to_point, title] pairs that was .append ed to the list (assuming there are at least two such values, to make the [1] index valid).首先,这将计算neighbors[1] ,这将是[distance_to_point, title]对之一,该对是.append编辑到列表中的(假设至少有两个这样的值,以使[1]索引有效)。

Then, the loop iterates over that two-element list, so it runs twice: the first time, i is equal to the distance value, and the second time it is equal to the title.然后,循环遍历该双元素列表,因此它运行两次:第一次, i等于距离值,第二次等于标题。 An error occurs because the title is a string and you try to do math with it.发生错误是因为标题是一个字符串,而您尝试用它进行数学运算。

total_rating = 0
for i in neighbors:
    title = neighbors[1]
    total_rating += movie_ratings[title]

This loop makes i take on each of the pairs as a value.这个循环使i将每一对作为一个值。 The title = neighbors[1] is broken; title = neighbors[1]坏了; now we ignore the i value completely and instead always use a specific pair, and also we try to use the pair (which is a list) as a title (we need a string).现在我们完全忽略i值,而是始终使用特定的对,并且我们尝试使用对(这是一个列表)作为标题(我们需要一个字符串)。

What you presumably wanted is:你大概想要的是:

total_rating = 0
for neighbor in neighbors:
    title = neighbor[1]
    total_rating += movie_ratings[title]

Notice I use a clearer name for the loop variable, to avoid confusion.请注意,我为循环变量使用了更清晰的名称,以避免混淆。 neighbor is one of the values from the neighbors list, ie, one of the distance-title pairs. neighborneighbors列表中的值之一,即距离-标题对之一。 From that, we can get the title, and then from the ratings data and the title, we can get the rating.从中,我们可以得到标题,然后从评分数据和标题中,我们可以得到评分。

I can make it clearer, by using argument unpacking:通过使用参数解包,我可以使其更清晰:

total_rating = 0
for neighbor in neighbors:
    distance, title = neighbor
    total_rating += movie_ratings[title]

Instead of having to understand the reason for a [1] index, now we label each part of the neighbor value, and then use the one that's relevant for our purpose.不必了解[1]索引的原因,现在我们 label 的每个部分的neighbor值,然后使用与我们的目的相关的那个。

I can make it simpler, by doing the unpacking right away:我可以通过立即进行拆包来使其更简单:

total_rating = 0
for distance, title in neighbors:
    total_rating += movie_ratings[title]

I can make it more elegant, by not trying to explain to Python how to do sums, and just telling it what to sum:我可以让它更优雅,不要试图向 Python 解释如何求和,而只是告诉它要求和:

total_rating = sum(movie_ratings[title] for distance, title in neighbors)

This uses a generator expression along with the built-in sum function , which does exactly what it sounds like.这使用了一个生成器表达式以及内置的 sum function ,它的功能与听起来完全一样。

distances is generated in the form:距离以下列形式生成:

[
[0.08565491616637051, 'Spectre'],
[0.1946446017955758, "Pirates of the Caribbean: At World's End"],
[0.20733104650812437, 'Avatar']
]

which is what neighbors is derived from, and the names are in position 1 of each list.这是邻居的来源,名称在每个列表的 position 1 中。 neighbors[1] would just retrieve [0.1946446017955758, "Pirates of the Caribbean: At World's End"] , or a single element, which doesn't look like is what you want. neighbors[1]只会检索[0.1946446017955758, "Pirates of the Caribbean: At World's End"]或单个元素,这看起来不像是您想要的。 It would try to use 0.19... and Pirates... as keys in dict movie_ratings .它会尝试使用0.19...Pirates...作为 dict movie_ratings中的键。

I'm guessing you want this, to average all the ratings of the closest by the extracted distance values from dataset?:我猜你想要这个,通过从数据集中提取的距离值来平均所有最接近的评级?:

  for dist, name in neighbors:
    total_rating += movie_ratings[name]
  return total_rating / len(neighbors)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM