简体   繁体   English

Python Scrapy:网络抓取ASP网站

[英]Python Scrapy: web-scraping asp site

I have scrapped many other site with needed form input, but i'm struggling to figure this one out.. 我已经用所需的表单输入废弃了许多其他站点,但是我正在努力弄清楚这一点。

http://search.appleone.com/ResumeSearch/results.asp http://search.appleone.com/ResumeSearch/results.asp

When I search for something, the website returns an arbitrary results.asp file that's not specific to the search term. 当我搜索某些内容时,网站将返回一个不是特定于搜索词的任意result.asp文件。 What I am trying to do is, input a search, scrape the results page. 我想做的是输入搜索,刮取结果页面。 I'm struggling with the input a search part. 我在输入搜索部分中苦苦挣扎。 Typically, I would do something like http://bdomainnameh.com/search/ input search 通常,我会做类似http://bdomainnameh.com/search/ input search的操作

I'd appreciate any help 我将不胜感激

The form has a hidden input (line 285: 表单具有隐藏的输入(第285行:

<form name="frmAE" action="process.asp?page=SearchDetailed" method="POST">
<input type=hidden name="hdnAction" value="">

When the "Next>>" button is clicked, it calls sendForm(2, 0) (line 428) which assigns a value of 2 to the hidden input (line 245) before submitting it as a POST request (not GET, which is why the results page is "not specific to the search term" ie does not show the search terms in the url). 单击“下一步>>”按钮时,它将调用sendForm(2, 0) (第428行),该sendForm(2, 0)将值2分配给隐藏的输入(第245行),然后再将其提交为POST请求(不是GET),即为什么结果页面不是“特定于搜索词”,即没有在网址中显示搜索词)。

You need to (a) add this hidden value to your request and (b) submit a POST request, not GET. 您需要(a)将此隐藏值添加到您的请求中,并且(b)提交POST请求,而不是GET。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM