簡體   English   中英

如何解決嘗試使用beautifulsoup獲取html內容時“不可接受”結果的錯誤

[英]How to solve the error of "Not Acceptable" result while trying fetch html content using beautifulsoup

在嘗試從網站獲取數據時,我無法使用 Beautiful Soup 獲取網站的 html 內容。 我正在使用基本的 get 函數和請求來獲取 html,但輸出為空。

url= 'https://www.turbobearings.com/application.php'
html= (requests.get(url))
soup_= soup(html.content, 'lxml')
newtry= soup_.find('div', 'class', 'kblock kcategories-1')

Result: <html><head><title>Not Acceptable!</title></head><body><h1>Not Acceptable!</h1><p>An appropriate representation of the requested resource could not be found on this server. This error was generated by Mod_Security.</p></body></html>

可以做些什么來獲取此網頁的 html 內容

問題是該網站檢測到您正在使用某種自動化工具,例如從其網站提取 html 代碼的requests 為了繞過這一點,請嘗試向您的請求添加user-agent user-agent將幫助您偽造身份並使您的請求看起來合法,因此網站不會將您檢測為機器人。 這是你如何做到的:

headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0'}

url= 'https://www.turbobearings.com/application.php'

html= requests.get(url,headers=headers)

這是完整的代碼:

from bs4 import BeautifulSoup as soup
import requests
headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0'}

url= 'https://www.turbobearings.com/application.php'
html= requests.get(url,headers=headers)
soup_= soup(html.content, 'lxml')
newtry= soup_.find('div', 'class', 'kblock kcategories-1')

print(soup_)

輸出:

<!DOCTYPE html>
<html class="" lang="en-gb" xml:lang="en-gb" xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Turbo Bearings Pvt Ltd Rajkot, Gujarat, India</title>
<meta content="width=device-width, initial-scale=1" name="viewports"/>
<meta content="Tapered Roller Bearings, Double Row Tapered Roller Bearings, Cylindrical Roller Bearings, Needle Roller Bearings, Ball Bearings, Rajkot, Gujarat, India" name="Description"/>
<meta content="Tapered Roller Bearings Metric Series, Tapered Roller Bearings Inch Series, Double Row Tapered Roller Bearings, Cylindrical Roller Bearings, Special Cylindrical Roller Bearings, Single Row Full Compliment, Double Row Full Compliment, Needle Roller Bearings, Split Type 2 Needle Roller Bearings, NK Type, Full Compliment Bearings, Ball Bearings, Deep Groove Ball Bearings, Special Ball Bearings, Angular Contact Ball Bearings, Special Angular Contact Ball Bearings, Steering Ball Bearings, Hub Ball Bearings, Thrust Ball Bearings, Thrust Single Direction, Thrust Double Direction, Clutch Bearings, Four Point Contact Bearings, King Pin Bearings, Steering Bearings, Spherical Roller Thrust Bearings, Special Roller Thrust Double Row, Standard Room, Laboratory, Metallurgical Microscope, Profile Projector, Talyrond, Form Talysurf, Roller Sorting Machine, Bearing Noise Level &amp; Vibration Tester, Grinding Facilities, Ring Super Finishing Facilities, Roller Super Finishing Facilites, Assembly and Dispatch Facilities, Laser Marking Machine, Design &amp; Drawing Department, Bearing Endurance (Life) Testing Machine" name="Keywords"/>
<meta content="index,follow" name="robots"/>
<meta content="Turbo Bearings Pvt. Ltd." name="Author"/>
<meta content="global" name="distribution"/>
<meta content="document" name="resource-type"/>
<meta content="http://www.turbobearings.com" name="identifier-url"/>
<meta content="www.turbobearings.com" name="copyright"/>
<meta content="IN" name="country"/>
<meta content="Manufacturers and Exporters" name="rating"/>
<meta content="2 days" name="revisit-after"/>
<link href="favicon.ico" rel="shortcut icon" type="image/vnd.microsoft.icon"/>
<link href="css/layout.css" rel="stylesheet" type="text/css"/>
<link href="css/jquery.fancybox.css" rel="stylesheet" type="text/css"/>
<link href="css/jquery.fancybox-buttons.css" rel="stylesheet" type="text/css"/>
<link href="css/jquery.fancybox-thumbs.css" rel="stylesheet" type="text/css"/>
<link href="css/template.css" rel="stylesheet" type="text/css"/>
<link href="css/lrstyle.css" rel="stylesheet" type="text/css"/>
<link href="css/caroufredsel.css" rel="stylesheet" type="text/css"/>
<link href="css/camera.css" rel="stylesheet" type="text/css"/>
<link href="css/default_icemegamenu.css" rel="stylesheet" type="text/css"/>
<link href="css/default_icemegamenu-reponsive.css" rel="stylesheet" type="text/css"/>
<link href="css/navbar.css" rel="stylesheet" type="text/css"/>
<link href="css/search.css" rel="stylesheet" type="text/css"/>
<link href="css/plugin_googlemap3.css" rel="stylesheet" type="text/css"/>
<script src="js/jquery.min.js" type="text/javascript"></script>
<!--  <script type='text/javascript' src='http://maps.googleapis.com/maps/api/js?v=3&amp;language=en-GB&amp;libraries=places'></script> -->
<script src="js/jquery-noconflict.js" type="text/javascript"></script>
<script src="js/jquery-migrate.min.js" type="text/javascript"></script>
<script src="js/caption.js" type="text/javascript"></script>
<script src="js/bootstrap.min.js" type="text/javascript"></script>
<script src="js/jquery.validate.min.js" type="text/javascript"></script>
<script src="js/additional-methods.min.js" type="text/javascript"></script>
<script src="js/jquery.caroufredsel.js" type="text/javascript"></script>
<script src="js/camera.min.js" type="text/javascript"></script>
<script src="js/menu.js" type="text/javascript"></script>
<script src="js/jquery.rd-navbar.js" type="text/javascript"></script>
<script src="js/googlemapsv3.js" type="text/javascript"></script>
<script src="js/TMSearch.js" type="text/javascript"></script>
<script language="JavaScript" src="scripts/gen_validatorv31.js" type="text/javascript"></script>
<script type="text/javascript">
jQuery(window).on('load',  function() {
                new JCaption('img.caption');
            });
jQuery(document).ready(function(){
    jQuery('.hasTooltip').tooltip({"html": true,"container": "body"});
});
window.setInterval(function(){var r;try{r=window.XMLHttpRequest?new XMLHttpRequest():new ActiveXObject("Microsoft.XMLHTTP")}catch(e){}if(r){r.open("GET","./",true);r.send(null)}},840000);
jQuery(document).ready(function($){
     RDMobilemenu_autoinit("#icemegamenu");
})
  </script>
<script>
    function GetProduct(prodid)
    {
        var prod_id=document.getElementById(prodid).innerHTML;
        document.getElementById('txtSeachProd').value=prod_id;
    }
</script>
<script>
...

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM