简体   繁体   English

具有保护功能的Python BeautifulSoup抓取网页

[英]Python BeautifulSoup scraping web page that has protection

I am new to Python. 我是Python的新手。 My recent project is to get the odd information from a website. 我最近的项目是从网站获取奇怪的信息。

Here is the URL: 这是网址:

http://bet.hkjc.com/default.aspx?url=football/odds/odds_allodds.aspx&lang=EN&tmatchid=120998 http://bet.hkjc.com/default.aspx?url=football/odds/odds_allodds.aspx&lang=EN&tmatchid=120998

I am using Python and BeautifulSoup to process, i cant see any odd data when get the html by 我正在使用Python和BeautifulSoup进行处理,通过html获取时我看不到任何奇怪的数据

soup.prettify()

The result from above code i got is only logical code, variable and function. 我从上面的代码中得到的结果仅仅是逻辑代码,变量和函数。 i think the page have some protection to the data 我认为页面对数据有一定的保护

what should i do to get the odd information from the protected web page? 我应该怎么做才能从受保护的网页中获取奇数信息?

It looks like it's not protected, but just generated with javascript. 看起来它没有受到保护,而是仅使用javascript生成。 And beautifulsoup can't handle JS. 而且beautifulsoup无法处理JS。 The first work-around most people come to is automating a web browser using something like selenium . 大多数人首先想到的解决方法是使用selenium之类的工具来自动化Web浏览器。 You can use this to get the html after the JS has run, and then parse with beautifulsoup as needed. 您可以使用它在JS运行后获取html,然后根据需要使用beautifulsoup进行解析。

The answer posted by SuperStew is right but the page loads " http://bet.hkjc.com/football/odds/odds_allodds.aspx?lang=EN&tmatchid=120998 " using JavaScript it is this page that has the data on odds. SuperStew发布的答案是正确的,但该页面使用JavaScript加载了“ http://bet.hkjc.com/football/odds/odds_allodds.aspx?lang=EN&tmatchid=120998 ”,这是该页面上具有赔率数据的页面。 You didn't state which odds you wanted but the code below is an example of one way to get some data if you want other data you will have to modify it. 您没有说明想要的赔率,但是下面的代码是获取某些数据的一种方法示例,如果您需要其他数据,则必须对其进行修改。

import bs4
import requests
url = "http://bet.hkjc.com/football/odds/odds_allodds.aspx?lang=EN&tmatchid=120998"
page = requests.get(url)
soup = bs4.BeautifulSoup(page.text,'lxml')
tOdds = soup.findAll('table', {'class':"tOdds"})
for tOdd in tOdds:
    print (tOdd.text)

Outputs: 输出:

  Jong PSV Eindhoven(Home) Draw Jong Utrecht(Away)   1.53 4.00 4.60 
  Jong PSV Eindhoven(Home) Draw Jong Utrecht(Away)   1.97 2.45 4.70 
  Jong PSV Eindhoven[-1](Home) Draw Jong Utrecht[+1](Away)   2.45 3.60 2.26 
  Line High Low  [3/3.5]2.021.70
  Line High Low   [1.5]2.191.60
     1.44    18.00    2.65   
  0 1 2 3 4 5 6 7+   18.00 6.60 4.10 3.65 4.50 6.70 11.00 14.00 
  Odd Even   1.90 1.80 
  H H H D D D A A A   H D A H D A H D A   2.30 14.00 34.00 4.70 6.50 10.50 19.00 14.00 7.50 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM