[英]Python Web Scraping same class
I am looking for some help, so i am pretty bad at web scraping, i am still learning the basics and stuff.我正在寻找一些帮助,所以我在网络抓取方面很糟糕,我仍在学习基础知识和东西。 So i am developing a application, you can put your question in the app, and it will fetch the answer(s) from google and return/print() the answer(s).
所以我正在开发一个应用程序,你可以把你的问题放在应用程序中,它会从谷歌获取答案并返回/打印()答案。 So when you enter a question in google like "what is a letter?"
所以当你在谷歌中输入一个问题,比如“什么是字母?” google returns two explanations:
谷歌返回两种解释:
a character representing one or more of the sounds used in speech;代表语音中使用的一种或多种声音的字符; any of the symbols of an alphabet.
字母表的任何符号。 "a capital letter"
“大写字母”
a written, typed, or printed communication, sent in an envelope by post or messenger.书面、打印或打印的通信,通过邮寄或信使在信封中发送。 "he sent a letter to Mrs Falconer"
“他给福尔克纳夫人寄了一封信”
now... Both got the same class when inspecting the element.现在......在检查元素时两者都得到了相同的类。 Which makes it impossible to print() both explanations out.
这使得 print() 两种解释都无法输出。 Because when i enter the class, which both explanations are having, it only prints out the first(1.) one, which i don't really understand, and is there any way to print both out even though they are having the same class?
因为当我进入两个解释都有的课程时,它只打印出第一个(1.)一个,我不太明白,即使他们有相同的课程,也有什么办法可以打印出来? Here is my code:
这是我的代码:
import requests
from bs4 import BeautifulSoup
search = input("Search: ")
URL = "https://www.google.co.in/search?q=" + search
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36 Edg/89.0.774.57'
}
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
result = soup.find(class_="LTKOO sY7ric").get_text()
print(result)
This will give you all the text of those classes.这将为您提供这些课程的所有文本。
txts = [ x.get_text() for x in soup.find_all(class_="LTKOO sY7ric")]
print(txts)
You can just run a for loop iterating over soup checking for every element with the required class name, then print out the text from the class您可以运行一个 for 循环,遍历所有具有所需类名的元素的汤检查,然后从类中打印出文本
for(ele in soup.find_all(class_="LTKOO sY7ric")):
print(ele.get_text())
The loop will help you extract all possible values:该循环将帮助您提取所有可能的值:
import requests
from bs4 import BeautifulSoup
search = input("Search: ")
URL = "https://www.google.co.in/search?q=" + search
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36 Edg/89.0.774.57'
}
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
results = []
for ele in soup.find_all(class_="LTKOO sY7ric"):
try:
result = ele.find(class_="LTKOO sY7ric").text.strip()
except AttributeError:
result = 'no data'
results.append(result)
print(results)
I hope this helps.我希望这有帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.