[英]Web scraping using Python
I'm trying to get data from a list of companies (currently testing only for one) from a website.我试图从一个网站的公司列表(目前只测试一个)中获取数据。 I am not sure I can recognise how to get the score that I want because I can only find the formatting part instead of the actual data.
我不确定我能否识别如何获得我想要的分数,因为我只能找到格式部分而不是实际数据。 Please could someone help?
请问有人可以帮忙吗?
from selenium import webdriver
import time
from selenium.webdriver.support.select import Select
driver=webdriver.Chrome(executable_path='C:\webdrivers\chromedriver.exe')
driver.get('https://www.refinitiv.com/en/sustainable-finance/esg-scores')
driver.maximize_window()
time.sleep(1)
cookie= driver.find_element("xpath", '//button[@id="onetrust-accept-btn-handler"]')
try:
cookie.click()
except:
pass
company_name=driver.find_element("id",'searchInput-1')
company_name.click()
company_name.send_keys('Jumbo SA')
time.sleep(1)
search=driver.find_element("xpath", '//button[@class="SearchInput-searchButton"]')
search.click()
time.sleep(2)
company_score = driver.find_elements("xpath",'//div[@class="fiscal-year"]')
print(company_score)
That's what I have so far.这就是我到目前为止所拥有的。 I want the number "42" to come back to my results but instead I get the below;
我希望数字“42”返回到我的结果中,但我得到了以下结果;
[<selenium.webdriver.remote.webelement.WebElement (session="bffa2fe80dd3785618b5c52d7087096d", element="62eaf2a8-d1a2-4741-8374-c0f970dfcbfe")>] [<selenium.webdriver.remote.webelement.WebElement (session="bffa2fe80dd3785618b5c52d7087096d", element="62eaf2a8-d1a2-4741-8374-c0f970dfcbfe")>]
My issue is that the locator is not working.我的问题是定位器不工作。
//div[@class="fiscal-year"] = This part I think is wrong - but I am not sure what I need to pick from the website. //div[@class="fiscal-year"] = 这部分我认为是错误的 - 但我不确定我需要从网站上挑选什么。
please use requests look at this example:请使用请求看这个例子:
import requests
url = "https://www.refinitiv.com/bin/esg/esgsearchsuggestions"
payload = ""
response = requests.request("GET", url, data=payload)
print(response.text)
so this returns something like this:所以这会返回这样的东西:
[
{
"companyName": "GEK TERNA Holdings Real Estate Construction SA",
"ricCode": "HRMr.AT"
},
{
"companyName": "Mytilineos SA",
"ricCode": "MYTr.AT"
},
{
"companyName": "Hellenic Telecommunications Organization SA",
"ricCode": "OTEr.AT"
},
{
"companyName": "Jumbo SA",
"ricCode": "BABr.AT"
},
{
"companyName": "Folli Follie Commercial Manufacturing and Technical SA",
"ricCode": "HDFr.AT"
},
{
]
Here we can see the text and the code behind it so for Jumbo SA its BABr.AT.在这里我们可以看到它背后的文本和代码,因此对于 Jumbo SA 来说,它是 BABr.AT。 Now with this info lets get the data:
现在有了这个信息让我们获取数据:
import requests
url = "https://www.refinitiv.com/bin/esg/esgsearchresult"
querystring = {"ricCode":"BABr.AT"} ## supply the company code
payload = ""
headers = {"cookie": "encaddr=NeVecfNa7%2FR1rLeYOqY57g%3D%3D"}
response = requests.request("GET", url, data=payload, headers=headers, params=querystring)
print(response.text)
Now we see the response is in json:现在我们看到响应在 json 中:
{
"industryComparison": {
"industryType": "Specialty Retailers",
"scoreYear": "2020",
"rank": "162",
"totalIndustries": "281"
},
"esgScore": {
"TR.TRESGCommunity": {
"score": 24,
"weight": 0.13
},
"TR.TRESGInnovation": {
"score": 9,
"weight": 0.05
},
"TR.TRESGHumanRights": {
"score": 31,
"weight": 0.08
},
"TR.TRESGShareholders": {
"score": 98,
"weight": 0.08
},
"TR.SocialPillar": {
"score": 43,
"weight": 0.42999998
},
"TR.TRESGEmissions": {
"score": 19,
"weight": 0.08
},
"TR.TRESGManagement": {
"score": 47,
"weight": 0.26
},
"TR.GovernancePillar": {
"score": 53,
"weight": 0.38999998569488525
},
"TR.TRESG": {
"score": 42,
"weight": 1
},
"TR.TRESGWorkforce": {
"score": 52,
"weight": 0.1
},
"TR.EnvironmentPillar": {
"score": 20,
"weight": 0.19
},
"TR.TRESGResourceUse": {
"score": 30,
"weight": 0.06
},
"TR.TRESGProductResponsibility": {
"score": 62,
"weight": 0.12
},
"TR.TRESGCSRStrategy": {
"score": 17,
"weight": 0.05
}
}
}
Now you can get the data you want without using selenium.现在您无需使用 selenium 即可获得所需的数据。 This way its faster, easier and better.
这样它更快、更容易、更好。
Please accept this as an answer.请接受这个作为答案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.