[英]How do I scrap text from "People also ask" section from google search using Selenium and Python
Selenium is really important here.硒在这里非常重要。 So, I want to create a program that can help me scrap stuff from google like the snippets etc while also giving me the ability to automate the browser for some other tasks.
所以,我想创建一个程序,它可以帮助我从谷歌中删除一些东西,比如片段等,同时让我能够自动化浏览器来完成其他一些任务。 And here's what I've done.
这就是我所做的。
from selenium import webdriver as webd
import time
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
driver = webd.Firefox()
driver.get('https://google.com/')
driver.find_element(By.NAME, 'q').send_keys('cars')
driver.find_element(By.NAME, 'q').send_keys(Keys.ENTER)
ques = driver.find_element(By.CLASS_NAME, "ULSxyf")
print(ques)
time.sleep(8)
driver.close()
Although the first few lines works fine.虽然前几行工作正常。 I'm unable to open the People also ask section with selenium no matter what I do.
无论我做什么,我都无法用 selenium 打开People also ask部分。 I've used the class name of the object by inspecting it or used the id etc etc. And after searching for awhile I haven't really found anything much that would help this specific case scenario.
我通过检查对象或使用了 id 等来使用对象的类名。在搜索了一段时间后,我并没有真正找到任何有助于这种特定情况的东西。 I need to know how exactly to do this or why my method isn't working, if anybody has any idea.
如果有人有任何想法,我需要知道如何准确地做到这一点或为什么我的方法不起作用。 I'd be glad if you let me know.
如果你让我知道,我会很高兴。 Thanks!
谢谢!
I'm a total beginner in selenium so if you can't give me a straight answer but feel a article or tutorial would be better, that would help as well.我是硒的初学者,所以如果你不能给我一个直接的答案,但觉得一篇文章或教程会更好,那也会有所帮助。
EDIT: I want to open the questions from People also ask section and extract the answers from in there along with the questions themselves.编辑:我想打开人们也问部分的问题,并从那里提取答案以及问题本身。
To extract the texts from the questions under People also ask column you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies :要从People also ask列下的问题中提取文本,您必须诱导WebDriverWait for visibility_of_all_elements_located()并且您可以使用以下任一定位器策略:
Code Block:代码块:
driver.get("https://google.com/") WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q"))).send_keys("cars" + Keys.RETURN) print([my_elem.get_attribute("data-q") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[text()='People also ask']//following::div[@data-q]")))])
Console Output:控制台输出:
['Which is the most popular car?', 'What are top 10 cars?', 'Which type of car is best?', 'Which car is very cheapest?']
Note : You have to add the following imports :注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.