人的模糊匹配列表

Question

I am trying to see if a movie is the same between two pages, and to do so I would like to compare the Actors as one of the criteria. 我试图查看两页之间的电影是否相同，为此，我想将演员作为标准之一。 However, actors are often listed differently on different pages. 但是，演员通常在不同的页面上以不同的方式列出。 For example: 例如：

On this page, https://play.google.com/store/movies/details?id=cSdcb2KOH74 , the actors are listed as "Mikhail Galustyan, Danny Trejo, Guillermo Díaz, Oleg Taktarov, Kym Whitley, Christopher Robin Miller, Robert Bear, Vladimir Yaglych, Josh McLerran" 在此页面https://play.google.com/store/movies/details?id=cSdcb2KOH74上，演员被列出为“米哈伊尔·加卢斯蒂安，丹尼·特雷霍，吉列尔莫·迪亚兹，奥列格·塔克塔罗夫，凯姆·惠特利，克里斯托弗·罗宾·米勒，罗伯特熊，弗拉基米尔·雅格（Joseph McLerran）
One this page, http://www.imdb.com/title/tt2167970/ , the actors as "Ivan Stebunov, Ingrid Olerinskaya, Vladimir Yaglych" 一页， http：//www.imdb.com/title/tt2167970/ ，演员是“伊万·斯特布诺夫（Ivan Stebunov），英格丽（Ingrid Olerinskaya），弗拉基米尔·雅格（Vladimir Yaglych）”

Previously, I was doing a very rough match on: 以前，我在以下方面做过非常粗略的匹配：

if actors_from_site_1[0] == actors_from_site_2[0]

But, as you can see from the above case, this isn't a good technique. 但是，从上述情况可以看出，这不是一个好方法。 What would be a better technique to see if the actors from one film match the others? 看看一部电影中的演员是否与其他演员匹配的更好的技术是什么？

Answer 1

You could check the length of a set intersection of the two sets of actors. 您可以检查两组参与者的集合交集的长度。

if len(set(actors_from_site_1).intersection(set(actors_from_site_2))):

or you could do something like: 或者您可以执行以下操作：

if any(actor in actors_from_site_1 for actor in actors_from_site_2):

Answer 2

If all the lists are comma separated actor names, split them on the commas, lowercase the names, and get the intersection: 如果所有列表都是用逗号分隔的演员名称，请在逗号上将它们分开，小写名称，然后得到交集：

actors_from_site_1 = set(actors_from_site_1.lower().split(','))
actors_from_site_2 = set(actors_from_site_2.lower().split(','))

common_actors = actors_from_site_1 & actors_from_site_2

Answer 3

Try: 尝试：

similaractors = []
for actor in actors_from_site_1:
    if actor in actors_from_site_2:
        similaractors.append(actor)

Then, you have similaractors as a list of all the actors they share. 然后，您将similaractors的演员作为他们共享的所有演员的列表。 Call len(similaractors) to get the number of similar actors, and then you can print(similaractors) and do everything else you might do with a list. 调用len(similaractors)以获得相似角色的数量，然后可以print(similaractors)相似角色print(similaractors)并执行列表可能要做的所有其他事情。

人的模糊匹配列表

问题描述

3 个解决方案

解决方案1
2 已采纳 2015-04-01 01:45:30

解决方案2
1 2015-04-01 01:42:16

解决方案3
1 2015-04-01 02:09:10

人的模糊匹配列表

问题描述

3 个解决方案

解决方案1 2 已采纳 2015-04-01 01:45:30

解决方案2 1 2015-04-01 01:42:16

解决方案3 1 2015-04-01 02:09:10

解决方案1
2 已采纳 2015-04-01 01:45:30

解决方案2
1 2015-04-01 01:42:16

解决方案3
1 2015-04-01 02:09:10