漂亮的汤网抓取返回 None-Python

Question

I have a list of movies that I want to scrap the genres from Google.我有一个电影列表，我想从 Google 中删除这些类型。 I've built this code:我已经构建了这段代码：

import requests
from bs4 import BeautifulSoup

list=['Se7en','Cinema Paradiso','The Shining','Toy Story 3','Capernaum']
gen2 = {}
for i in list:
  user_query = i +'movie genre'
  URL = 'https://www.google.co.in/search?q=' + user_query
  headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.63 Safari/537.36'}
  page = requests.get(URL, headers=headers)
  soup = BeautifulSoup(page.content, 'html.parser')
  c = soup.find(class_='EDblX DAVP1')
  print(c)
  if c != None:
    genres = c.findAll('a')
    gen2[i]= genres

But it returns an empty dict, so I checked one by one and it worked, for example:但它返回一个空字典，所以我一一检查它并工作，例如：

import requests
from bs4 import BeautifulSoup

user_query = 'Se7en movie genre' 
URL = "https://www.google.co.in/search?q=" + user_query
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.63 Safari/537.36'}
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
v = soup.find(class_='KKHQ8c')
h = {}
genres = v.findAll('a')
for genre in genres:
  h['Se7en']=genre

So I find out that in the for loop the variable c is returning None.所以我发现在 for 循环中变量 c 返回 None。 I can't figure out why!我不知道为什么！ It only return None inside the loop.它只在循环内返回 None 。

Answer 1

Currently, your URLs are of the form URLs目前，您的 URL 的格式为URL

so the returned results(google) aren't accurate for all the movies.所以返回的结果（谷歌）对于所有电影并不准确。 You can change it to您可以将其更改为

`for i in list:
  i="+".join(i.split(" "));          
  user_query = i + "+movie+genre"
  URL = 'https://www.google.com/search?q=+'+user_query`

also, movies that belong to a single genre like Cinema Paradiso are in a div with class name "Z0LcW".此外，属于单一类型的电影（如天堂影院）位于类名为“Z0LcW”的 div 中。

漂亮的汤网抓取返回 None-Python

问题描述

1 个解决方案

解决方案1
0 2022-07-21 16:46:05

漂亮的汤网抓取返回 None-Python

问题描述

1 个解决方案

解决方案1 0 2022-07-21 16:46:05

解决方案1
0 2022-07-21 16:46:05