如何从IMDB网站上抓取电影信息？

Question

I am new with Python and trying to scrape IMDB. 我是Python的新手，正在尝试抓取IMDB。 I am scraping a list of 250 top IMDB movies and want to get information on each unique website for example the length of each movie. 我正在抓取250部IMDB顶级电影的清单，并希望获得每个唯一网站上的信息，例如每个电影的长度。

I already have a list of unique URLs. 我已经有一个唯一URL列表。 So, I want to loop over this list and for every URL in this list I want to retrieve the 'length' of that movie. 因此，我想遍历此列表，并针对该列表中的每个URL检索该电影的“长度”。 Is this possible to do in one code? 这可以用一个代码完成吗？

for URL in urlofmovie:
    htmlsource = requests.get(URL)
    tree_url = html.fromstring(htmlsource)
    lengthofmovie = tree_url.xpath('//*[@class="subtext"]')

I expect that lengthofmovie will become a list of all the lengths of the movies. 我希望lengthofmovie会成为所有电影长度的清单。 However, it already goes wrong at line 2: the htmlsource . 但是，它在第2行： htmlsource已经出错。

Answer 1

To make it a list you should first create a list and then append each length to that list. 要使其成为列表，您应该首先创建一个列表，然后将每个长度附加到该列表中。

length_list = []
for URL in urlofmovie:
    htmlsource = requests.get(URL)
    tree_url = html.fromstring(htmlsource)
    length_list.append(tree_url.xpath('//*[@class="subtext"]'))

Small tip : Since you are new to Python I would suggest you to go over PEP8 conventions . 小提示 ：由于您是Python的新手，所以建议您使用PEP8约定。 Your variable naming can make your(and other developers) life easier. 您的变量命名可以使您（和其他开发人员）的生活更轻松。 (urlofmovie -> urls_of_movies) （urlofmovie-> urls_of_movies）

However, it already goes wrong for at line 2: the htmlsource. 但是，对于第2行：htmlsource，它已经出错了。

Please provide the exception you are receiving. 请提供您收到的例外情况。

如何从IMDB网站上抓取电影信息？

问题描述

1 个解决方案

解决方案1
2 2019-05-13 11:14:40

如何从IMDB网站上抓取电影信息？

问题描述

1 个解决方案

解决方案1 2 2019-05-13 11:14:40

解决方案1
2 2019-05-13 11:14:40