简体   繁体   English

python web scraping:onclick ajax请求不返回状态为200的任何内容

[英]python web scraping: onclick ajax request returns nothing with status 200

I am trying to scrape a table data from a website. 我正在尝试从网站上抓取表格数据。 The data I want is "hiding" behind an onclick event. 我想要的数据“隐藏”在一个onclick事件后面。

<a class="text" onclick="javascript:openPAOnSR_RS('some_sku', 'brandname','divId', 'some_args','OPC Page Details');cmTagAndLink('Open Link','OPC Page Details',null,null,null);">The Click</a>

After clicking, there is a post request and some of the details below. 点击后,有一个发布请求和下面的一些详细信息。

Request URL:http://www.somewebsite.com/catalog/tables.do?some_sku=sku&brandKey=brandname&divId=divId
Request Method:POST
Status Code:200 OK
Remote Address:23.xxxxxxxxxxx
Referrer Policy:no-referrer-when-downgrade

So I wrote the code as below but it did not return anything. 因此,我编写了如下代码,但未返回任何内容。

from urllib.parse import urlencode
from requests.exceptions import RequestException
import requests


def get_page_index():
    string_param = {
        'some_sku': 'sku',
        'brandKey': 'brandname',
        'divId': 'divId'
    }

    url = "http://www.somewebsite.com/catalog/tables.do?" + urlencode(string_param)
    try:
        response = requests.post(url=url, data=string_param)
        if response.status_code == 200:
            print(response.url, response.content)
            return response.text
        return None
    except RequestException as e:
        print(e)

I am getting no output and the status shows 200. How should I get the data "behind" on click event? 我没有输出,状态显示为200。如何在单击事件中“隐藏”数据?

urllib will only respond you with the html content, so you can't interfere with the JS stuff on that website, there are modules like robobrowser , scrapy but they only click the html check boxes or buttons. urllib只会用html内容回复您,因此您不会干扰该网站上的JS东西,其中有robobrowserscrapy类的模块,但它们仅单击html复选框或按钮。
so other options with are preferable are. 因此,其他选项更可取。

1) Selenium by using a headless browser using Phantom . 1)通过使用Phantom的无头浏览器来Selenium

2) Using Scrapy + splash 2)使用Scrapy + splash

can i ask that after what steps you are doing before clicking the button? 我可以问一下您执行了什么步骤之后再点击按钮吗?
are you clicking on the button after putting some info. 您放置一些信息后单击按钮吗? or you are just clicking the button as the website appears? 或者您只是在网站显示时单击按钮?

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Web 用 python 抓取,request.json() 显示 status_code 为 200 但无法提取 json 数据 - Web scraping with python, request.json() shows status_code of 200 but can not extract json data python web scraping .find()不返回任何内容 - python web scraping .find() returns nothing 带有请求的 Python 网页抓取 - 状态代码 200,但未成功登录 - Python web scraping with requests - status code 200, but no successful login python中的网页抓取返回[] - Web scraping in python returns [] Python POST 请求返回 404 状态代码,但 GET 请求返回 200 - Python POST Request returns 404 status code but GET request returns 200 Python 发布请求响应状态为“200”,但在所需页面中没有发布任何内容 - Python post request response status is "200" but when nothing is getting posted in the desired page Web刮取程序循环不返回任何内容 - Web scraping program for loop returns nothing 如何在 Web 使用代理在 python 请求中抓取时获得更少的 403 和更多的 200 响应? - How to get less 403 and more 200 response while Web Scraping in python request using proxy? 使用Python和BeautifulSoup搜寻范围不会返回任何内容 - Scraping for a span using Python and BeautifulSoup returns nothing 使用 python 上传文件返回请求失败,状态代码为 &#39;, 403, &#39;Expected one of&#39;,<HTTPStatus.OK: 200> - Uploading file with python returns Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM