简体   繁体   English

Python获取请求并从搜索中检索数据

[英]Python get request and retrieving data from search

I'm trying to use the requests module to retrieve data from this website: https://toelatingen.ctgb.nl/ 我正在尝试使用请求模块从以下网站检索数据: https : //toelatingen.ctgb.nl/

I want to receive the found data when I put in "11462" the "Zoekterm" field for example. 例如,当我在“ Zoekterm”字段中输入“ 11462”时,我想接收找到的数据。

data = { "searchTerm": "11462"}
session = requests.Session()
r = session.post('https://toelatingen.ctgb.nl/',data=data)

body_data = r.text

The content of the body_data does not, unfortunately, contain the information searched for. 不幸的是,body_data的内容不包含所搜索的信息。

Thanks for helping me. 谢谢你帮我

The reason you're not getting the response data is because the site doesn't do the search at that url. 您没有获得响应数据的原因是因为该站点未在该URL上进行搜索。 Instead it makes a call to https://toelatingen.ctgb.nl/nl/admissions/overview . 而是调用https://toelatingen.ctgb.nl/nl/admissions/overview

When you're trying to get information off the internet the first thing you want to do is check how your web browser is getting the data. 当您尝试从Internet上获取信息时,您要做的第一件事是检查Web浏览器如何获取数据。 If you open up whatever inspection tool comes with your browser of choice (typically the hotkey is ctrl+shift+i), you should be able to find a Network tab that tracks the requests and responses the browser makes. 如果打开所选浏览器随附的任何检查工具(通常热键为ctrl + shift + i),则应该能够找到一个“网络”选项卡,该选项卡可跟踪浏览器发出的请求和响应。 Once you have that open, get your browser to display the information you want and watch the Network Tab while it's doing so. 打开后,让您的浏览器显示所需的信息,并在其中查看“网络”选项卡。 Check whatever responses come up to find the one that has the information you want and then replicate the request your browser used. 检查出现的任何响应以找到具有所需信息的响应,然后复制浏览器使用的请求。

In your case: 在您的情况下:

  • The root page loads an empty page first from https://toelatingen.ctgb.nl/ 根页面首先从https://toelatingen.ctgb.nl/加载空白页面
  • It then loads a bunch of static files (mostly woff and js; these are used for styling the webpage and handling different proceedures) 然后,它会加载一堆静态文件(主要是woff和js;这些文件用于设置网页的样式并处理不同的过程)
  • Then it makes a call to https://toelatingen.ctgb.nl/nl/admissions/overview . 然后,它拨打https://toelatingen.ctgb.nl/nl/admissions/overview We can be pretty sure this is the call we want at this point because the response is a json which contains the information that we see displayed on the screen. 我们可以确定这是我们目前想要的呼叫,因为响应是一个json,其中包含我们在屏幕上看到的信息。
  • We then copy out all the information- headers and forms, line for line- from that request, plug it in, and see if the requests module returns the same json. 然后,我们从该请求中复制所有信息(标题和表格,一行一行),插入,然后查看requests模块是否返回相同的json。
  • If it doesn't then that most likely means we're missing something (most often a CSRF Token or a special Accept-Encoding) and we need to do some more tinkering. 如果不是,那很可能意味着我们缺少了某些东西(最常见的是CSRF令牌或特殊的Accept-Encoding),我们需要做更多的修改。
  • I would also recommend taking a little bit of time to prune out parts of the request data/headers: most of the time they contain extra terms that the server doesn't actually need. 我还建议您花一些时间来修剪部分请求数据/标题:大多数情况下,它们包含服务器实际上不需要的额外条款。 This will save space and give you a better idea of what parts of the request you can change. 这样可以节省空间,并使您更好地了解可以更改请求的哪些部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM