Web 在 Python 中抓取 - 从网站中提取值

Question

I'm trying to extract two values from this website:我正在尝试从该网站中提取两个值：

bizportal.co.ilbizportal.co.il

One value is the dollar rate from the right, and from the left the drop/rise in percentage.一个值是右边的美元汇率，左边是百分比的下降/上升。

The problem is that, after I'm getting the dollar rate value, the number is rounded from some reason.问题是，在我得到美元汇率值之后，由于某种原因，这个数字被四舍五入了。 (You can see in the terminal). （您可以在终端中看到）。 I want to get the exactly number as shown in the website.我想得到网站上显示的确切数字。

Is there some friendly documentation for web scraping in Python?在 Python 中是否有一些关于 web 抓取的友好文档？

PS: how can I get rid of the pop up Python terminal window when running a code in VS? PS：在VS中运行代码时如何摆脱弹出Python终端window？ I just want the output will be in VS - in the interactive window.我只希望 output 将在 VS 中 - 在交互式 window 中。

my_url = "https://www.bizportal.co.il/forex/quote/generalview/22212222" 
                 
uClient = urlopen(my_url) 
                                                        
page_html = uClient.read()  

uClient.close()                                                                      

page_soup = BeautifulSoup(page_html, "html.parser")                                 

div_class = page_soup.findAll("div",{"class":"data-row"})                      

print (div_class)
#print(div_class[0].text)
#print(div_class[1].text)

Answer 1

The data is loaded dynamically via Ajax, but you can simulate this request with requests module:数据通过 Ajax 动态加载，但您可以使用requests模块模拟此请求：

import json
import requests

url = 'https://www.bizportal.co.il/forex/quote/generalview/22212222'
ajax_url = "https://www.bizportal.co.il/forex/quote/AjaxRequests/DailyDeals_Ajax?paperId={paperId}&take=20&skip=0&page=1&pageSize=20"
paper_id = url.rsplit('/')[-1]
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0'}

data = requests.get(ajax_url.format(paperId=paper_id), headers=headers).json()

# uncomment this to print all data:
#print(json.dumps(data, indent=4))

# print first one
print(data['Data'][0]['rate'], data['Data'][0]['PrecentageRateChange'])

Prints:印刷：

3.4823 -0.76%

Answer 2

The problem is this element is being dynamically updated with Javascript.问题是这个元素正在使用 Javascript 动态更新。 You will not be able to scrape the 'up to date' value with urllib or requests.您将无法使用 urllib 或 requests 抓取“最新”值。 When the page is loaded, it has a recent value populated (likely from a database) and then it is replaced with the real time number via Javascript.加载页面时，会填充最近的值（可能来自数据库），然后通过 Javascript 将其替换为实时数字。

In this case it would be better to use something like Selenium, to load the webpage - this allows the javascript to execute on the page, and then scrape the numbers.在这种情况下，最好使用 Selenium 之类的东西来加载网页 - 这允许 javascript 在页面上执行，然后抓取数字。

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time

options = Options()
options.add_argument("--headless") # allows you to scrape page without opening the browser window
driver = webdriver.Chrome('./chromedriver', options=options)

driver.get("https://www.bizportal.co.il/forex/quote/generalview/22212222")
time.sleep(1) # put in to allow JS time to load, sometimes works without.
values = driver.find_elements_by_class_name('num')
price = values[0].get_attribute("innerHTML")
change = values[1].find_element_by_css_selector("span").get_attribute("innerHTML")

print(price, "\n", change)

Output: Output：

╰─$ python selenium_scrape.py
3.483 
 -0.74%

You should familiarize yourself with Selenium, understand how to set it up, and run it - this includes installing the browser (in this case I am using Chrome, but you can use others), understanding where to get the browser driver (Chromedriver in this case) and understand how to parse the page.您应该熟悉 Selenium，了解如何设置和运行它 - 这包括安装浏览器（在这种情况下我使用的是 Chrome，但您可以使用其他浏览器），了解从哪里获取浏览器驱动程序（Chromedriver 在这个case) 并了解如何解析页面。 You can learn all about it here https://www.selenium.dev/documentation/en/您可以在这里了解所有信息https://www.selenium.dev/documentation/en/

Web 在 Python 中抓取 - 从网站中提取值

问题描述

2 个解决方案

解决方案1
2 2020-06-02 10:26:09

解决方案2
0 2020-06-02 09:32:46

Web 在 Python 中抓取 - 从网站中提取值

问题描述

2 个解决方案

解决方案1 2 2020-06-02 10:26:09

解决方案2 0 2020-06-02 09:32:46

解决方案1
2 2020-06-02 10:26:09

解决方案2
0 2020-06-02 09:32:46