简体   繁体   English

使用PHP搜寻网站时如何与页面元素互动?

[英]How to interact with page elements while crawling a website with PHP?

I need to go to http://butlercountyclerk.org/bcc-11112005/ForeclosureSearch.aspx , enter data in the fields, then click the button to get results. 我需要转到http://butlercountyclerk.org/bcc-11112005/ForeclosureSearch.aspx ,在字段中输入数据,然后单击按钮以获取结果。 When taken to the result page, I'm given a table of data but it's paginated into 5 different pages. 当进入结果页面时,我得到了一个数据表,但将其分页为5个不同的页面。

I'm able to do the above using cURL, but it's at this point that I get stuck. 我可以使用cURL来完成上述操作,但是到这一点,我陷入了困境。

Once I'm on the result page, I need to click the "date" header twice to make the data order by decreasing date, then skim off the current day's results. 进入结果页面后,我需要单击“日期”标题两次,以通过减少日期来使数据顺序,然后略过当天的结果。

Any idea how to do this, advanced detail or in concept? 任何想法如何做到这一点,高级细节或概念? Either way should help. 两种方法都应该有所帮助。

Thanks! 谢谢!

The problem is that the click is actually performing a postback using javascript, with the limitations of PHP and cURL you will need to inspect the HTTP headers (GET, POST and COOKIES) being sent by the browser, and emulate them. 问题是单击实际上是使用javascript执行回发,但由于PHP和cURL的限制,您将需要检查浏览器发送的HTTP标头(GET,POST和COOKIES)并进行仿真。 Taking in mind that some values might be session dependent. 请记住,某些值可能取决于会话。 Right now I don't have time to do this for you but I know it can be quite tricky with ASP.Net websites in some cases. 现在,我没有时间为您执行此操作,但是我知道在某些情况下使用ASP.Net网站可能会非常棘手。 There might be easier ways to do it, but that's what it will always come down to, because that's what happens. 可能会有更简单的方法来执行此操作,但这就是它总是会归结于此,因为这就是发生的情况。

If you weren't tied to PHP a whole world of options open - for example, the aggregator in the project I'm working on is actually capable of executing (controlled) javascript specifically for these kinds of tasks/pages (albeit on a grander scale). 如果您不拘泥于PHP,则可以打开很多选项-例如,我正在研究的项目中的聚合器实际上能够执行(控制)针对这些类型的任务/页面的javascript(尽管在更大范围内)规模)。

I can't get a working set of results - if you could post some dummy data that gives results, that would help. 我无法获得一组有效的结果-如果您可以发布一些可以得出结果的虚拟数据,那将会有所帮助。

As a generic answer, you need something that can manipulate the DOM. 作为通用的答案,您需要一些可以操纵DOM的东西。 You can go server-side with something like PHP and Webdriver, or purely client-side with Selenium. 您可以使用PHP和Webdriver之类的服务器端,或者使用Selenium进行纯客户端。 Simulate the click, get the resulting HTML and parse that. 模拟点击,获取生成的HTML并进行解析。

This should work. 这应该工作。 try this. 尝试这个。

$url    ='http://butlercountyclerk.org/bcc-11112005/ForeclosureSearch.aspx';

## do curl , with cookies enabled.


## after do this.

$url    =$url.'?'.'__EVENTTARGET=Search%3AdgSearch%3A_ctl2%3A_ctl1&__EVENTARGUMENT=&__VIEWSTATE=dDwtMjk2Mjk5NzczO3Q8O2w8aTwxPjs%2BO2w8dDw7bDxpPDE%2BOz47bDx0PDtsPGk8Mz47aTwxNz47aTwxOT47PjtsPHQ8dDw7cDxsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0PjtpPDU%2BOz47bDxwPDIwMDY7MjAwNj47cDwyMDA3OzIwMDc%2BO3A8MjAwODsyMDA4PjtwPDIwMDk7MjAwOT47cDwyMDEwOzIwMTA%2BO3A8MjAxMTsyMDExPjs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VmlzaWJsZTs%2BO2w8bzx0Pjs%2BPjs%2BOzs%2BO3Q8QDA8cDxwPGw8Q3VycmVudFBhZ2VJbmRleDtQYWdlQ291bnQ7XyFJdGVtQ291bnQ7XyFEYXRhU291cmNlSXRlbUNvdW50O0RhdGFLZXlzOz47bDxpPDA%2BO2k8ND47aTwxMD47aTw0MD47bDw%2BOz4%2BOz47Ozs7Ozs7Ozs7PjtsPGk8MD47PjtsPHQ8O2w8aTwyPjtpPDM%2BO2k8ND47aTw1PjtpPDY%2BO2k8Nz47aTw4PjtpPDk%2BO2k8MTA%2BO2k8MTE%2BOz47bDx0PDtsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0Pjs%2BO2w8dDw7bDxpPDA%2BOz47bDx0PHA8cDxsPFRleHQ7TmF2aWdhdGVVcmw7PjtsPENWIDIwMTEgMDUgMTQzNjtodHRwOi8vd3d3LmJ1dGxlcmNvdW50eWNsZXJrLm9yZy9wYS9wYS51cmQvcGFtdzIwMDAtb19jYXNlX3N1bT8xNjE3NzE0OSAgICAgICAgICAgIDs%2BPjs%2BOzs%2BOz4%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NS8zLzIwMTE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPFNVTlRSVVNUIE1PUlRHQUdFIElOQzs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8TkFUSEFOSUVMIEdBQkJBUkQ7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDEzNTQgVkFOREVSVkVFUiBBVkUgSEFNSUxUT04sIE9IIDQ1MDExOz4%2BOz47Oz47Pj47dDw7bDxpPDA%2BO2k8MT47aTwyPjtpPDM%2BO2k8ND47PjtsPHQ8O2w8aTwwPjs%2BO2w8dDxwPHA8bDxUZXh0O05hdmlnYXRlVXJsOz47bDxDViAyMDExIDA1IDE0MTU7aHR0cDovL3d3dy5idXRsZXJjb3VudHljbGVyay5vcmcvcGEvcGEudXJkL3BhbXcyMDAwLW9fY2FzZV9zdW0%2FMTk2MzQ4ODUgICAgICAgICAgICA7Pj47Pjs7Pjs%2BPjt0PHA8cDxsPFRleHQ7PjtsPDUvMi8yMDExOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxUSElSRCBGRURFUkFMIFNBVklOR1MgQU5EIExPQU4gQVNTTiBPRiBDTEVWRUxBTkQ7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPEdBWUxFIE5BU0g7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDg5NDEgQ09YIFJEIFdFU1QgQ0hFU1RFUiwgT0ggNDUwNjk7Pj47Pjs7Pjs%2BPjt0PDtsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0Pjs%2BO2w8dDw7bDxpPDA%2BOz47bDx0PHA8cDxsPFRleHQ7TmF2aWdhdGVVcmw7PjtsPENWIDIwMTEgMDUgMTUwMztodHRwOi8vd3d3LmJ1dGxlcmNvdW50eWNsZXJrLm9yZy9wYS9wYS51cmQvcGFtdzIwMDAtb19jYXNlX3N1bT8yMjY1MTYxMiAgICAgICAgICAgIDs%2BPjs%2BOzs%2BOz4%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NS85LzIwMTE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPFUgUyBCQU5LIE4gQTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8TE9VSVMgTUlSTUFOOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDw2OTkxIEdBUlkgTEVFIERSIFdFU1QgQ0hFU1RFUiwgT0ggNDUwNjk7Pj47Pjs7Pjs%2BPjt0PDtsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0Pjs%2BO2w8dDw7bDxpPDA%2BOz47bDx0PHA8cDxsPFRleHQ7TmF2aWdhdGVVcmw7PjtsPENWIDIwMTEgMDUgMTQ5MjtodHRwOi8vd3d3LmJ1dGxlcmNvdW50eWNsZXJrLm9yZy9wYS9wYS51cmQvcGFtdzIwMDAtb19jYXNlX3N1bT8yMzk3NTc5MiAgICAgICAgICAgIDs%2BPjs%2BOzs%2BOz4%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NS82LzIwMTE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPEZJRlRIIFRISVJEIE1PUlRHQUdFIENPOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxSQVlNT05EIFNURUlOOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDwyMzU5IFRIUlVTSCBBVkUgRkFJUkZJRUxELCBPSCA0NTAxNDs%2BPjs%2BOzs%2BOz4%2BO3Q8O2w8aTwwPjtpPDE%2BO2k8Mj47aTwzPjtpPDQ%2BOz47bDx0PDtsPGk8MD47PjtsPHQ8cDxwPGw8VGV4dDtOYXZpZ2F0ZVVybDs%2BO2w8Q1YgMjAxMSAwNSAxNDM4O2h0dHA6Ly93d3cuYnV0bGVyY291bnR5Y2xlcmsub3JnL3BhL3BhLnVyZC9wYW13MjAwMC1vX2Nhc2Vfc3VtPzI0NzgyOTYzICAgICAgICAgICAgOz4%2BOz47Oz47Pj47dDxwPHA8bDxUZXh0Oz47bDw1LzMvMjAxMTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8V0VMTFMgRkFSR08gQkFOSyBOIEE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPEpBTkVUIEJPRUhNOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDw4NjA4IEdPTERGSU5DSCBXQVkgV0VTVCBDSEVTVEVSLCBPSCA0NTA2OTs%2BPjs%2BOzs%2BOz4%2BO3Q8O2w8aTwwPjtpPDE%2BO2k8Mj47aTwzPjtpPDQ%2BOz47bDx0PDtsPGk8MD47PjtsPHQ8cDxwPGw8VGV4dDtOYXZpZ2F0ZVVybDs%2BO2w8Q1YgMjAxMSAwNSAxNDQwO2h0dHA6Ly93d3cuYnV0bGVyY291bnR5Y2xlcmsub3JnL3BhL3BhLnVyZC9wYW13MjAwMC1vX2Nhc2Vfc3VtPzI1NTkwMjAzICAgICAgICAgICAgOz4%2BOz47Oz47Pj47dDxwPHA8bDxUZXh0Oz47bDw1LzQvMjAxMTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8RklGVEggVEhJUkQgQkFOSzs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8VEhFT0RPUkUgQ09PSzs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8UE8gQk9YIDE3MTEgV0VTVCBDSEVTVEVSLCBPSCA0NTA3MTs%2BPjs%2BOzs%2BOz4%2BO3Q8O2w8aTwwPjtpPDE%2BO2k8Mj47aTwzPjtpPDQ%2BOz47bDx0PDtsPGk8MD47PjtsPHQ8cDxwPGw8VGV4dDtOYXZpZ2F0ZVVybDs%2BO2w8Q1YgMjAxMSAwNSAxNDkwO2h0dHA6Ly93d3cuYnV0bGVyY291bnR5Y2xlcmsub3JnL3BhL3BhLnVyZC9wYW13MjAwMC1vX2Nhc2Vfc3VtPzI2ODY3MDkxICAgICAgICAgICAgOz4%2BOz47Oz47Pj47dDxwPHA8bDxUZXh0Oz47bDw1LzYvMjAxMTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8Q0lUSUZJTkFOQ0lBTCBJTkM7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPERPTk5BIE1BUkRJUzs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NjU0OSBDQU5BU1RPVEEgRFJJVkUgSEFNSUxUT04sIE9IIDQ1MDExOz4%2BOz47Oz47Pj47dDw7bDxpPDA%2BO2k8MT47aTwyPjtpPDM%2BO2k8ND47PjtsPHQ8O2w8aTwwPjs%2BO2w8dDxwPHA8bDxUZXh0O05hdmlnYXRlVXJsOz47bDxDViAyMDExIDA1IDE0Njg7aHR0cDovL3d3dy5idXRsZXJjb3VudHljbGVyay5vcmcvcGEvcGEudXJkL3BhbXcyMDAwLW9fY2FzZV9zdW0%2FMjk4NzU2MDIgICAgICAgICAgICA7Pj47Pjs7Pjs%2BPjt0PHA8cDxsPFRleHQ7PjtsPDUvNS8yMDExOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxDSVRJTU9SVEdBR0UgSU5DOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxNQVRUSEVXIEJMVU5ERUxMOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDwxNDEyIEhFTE1BIEFWRSBIQU1JTFRPTiwgT0ggNDUwMTM7Pj47Pjs7Pjs%2BPjt0PDtsPGk8MD47aTwxPjtpPDI%2BO2k8Mz47aTw0Pjs%2BO2w8dDw7bDxpPDA%2BOz47bDx0PHA8cDxsPFRleHQ7TmF2aWdhdGVVcmw7PjtsPENWIDIwMTEgMDUgMTQzMjtodHRwOi8vd3d3LmJ1dGxlcmNvdW50eWNsZXJrLm9yZy9wYS9wYS51cmQvcGFtdzIwMDAtb19jYXNlX3N1bT8zMjI0MzYxNyAgICAgICAgICAgIDs%2BPjs%2BOzs%2BOz4%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8NS8zLzIwMTE7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPFdFTExTIEZBUkdPIEJBTksgTiBBOz4%2BOz47Oz47dDxwPHA8bDxUZXh0Oz47bDxKT0hOIEJPV01BTjs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8Jm5ic3BcOzs%2BPjs%2BOzs%2BOz4%2BO3Q8O2w8aTwwPjtpPDE%2BO2k8Mj47aTwzPjtpPDQ%2BOz47bDx0PDtsPGk8MD47PjtsPHQ8cDxwPGw8VGV4dDtOYXZpZ2F0ZVVybDs%2BO2w8Q1YgMjAxMSAwNSAxNDYzO2h0dHA6Ly93d3cuYnV0bGVyY291bnR5Y2xlcmsub3JnL3BhL3BhLnVyZC9wYW13MjAwMC1vX2Nhc2Vfc3VtPzQyMjcwMTE5ICAgICAgICAgICAgOz4%2BOz47Oz47Pj47dDxwPHA8bDxUZXh0Oz47bDw1LzQvMjAxMTs%2BPjs%2BOzs%2BO3Q8cDxwPGw8VGV4dDs%2BO2w8VSBTIEJBTksgTkFUSU9OQUwgQVNTT0NJQVRJT047Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPEJSWUFOIFNDSE1JRFQ7Pj47Pjs7Pjt0PHA8cDxsPFRleHQ7PjtsPDI4OTUgV0VFUElORyBXSUxMT1cgRFJJVkUgSEFNSUxUT04sIE9IIDQ1MDExOz4%2BOz47Oz47Pj47Pj47Pj47Pj47Pj47Pj47PtVTse1TdIXrxq%2FXrY%2Fp22QQ7pAh&Search%3AddlMonth=5&Search%3AddlYear=2011&Search%3AtxtCompanyName=&Search%3AtxtLastName=&Search%3AtxtCaseNumber=';

## DO curl with cookies on again

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM