简体   繁体   English

从第三方网站获取数据

[英]Fetching data from thirdparty websites

I work in a small healthcare related office and we often have to look up license and other related official numbers of physicians. 我在与医疗保健相关的小型办公室里工作,我们经常需要查找执照和其他相关的正式医生号码。 We use websites that are free and available to the public to do so. 我们使用免费的网站向公众开放。 I've been tasked with figuring out a way to enter in the physician name and then return the results from all of the websites in a single entry to reduce the amount of time spent going through each website. 我的任务是想出一种方法来输入医师姓名,然后在一个条目中返回所有网站的结果,以减少浏览每个网站所花费的时间。 I'm familiar with javascript, php and ruby but by no means an expert. 我熟悉javascript,php和ruby,但绝不是专家。 My question is, where should I start? 我的问题是,我应该从哪里开始? I don't need anyone to write the code for me or anything, but I can't seem to form the right question to google for some answers. 我不需要任何人为我或任何东西编写代码,但是我似乎无法为Google提出一些答案的正确问题。 I'm fairly sure this is possible, just not sure where to start developing my idea. 我相当确定这是可能的,只是不确定从哪里开始发展我的想法。 Any help would be appreciated. 任何帮助,将不胜感激。

It sounds like you need to do some screen scraping, which may or may not be allowed by the terms and conditions of the sites you're using - you should check that first. 听起来您需要进行一些屏幕抓取,您所使用的网站的条款和条件可能会或可能不会允许您进行抓取-您应该先检查一下。

If there aren't any restrictions on automatic retrieval and querying, you'll want to read up on PHP's cURL module, and simulate the form actions that are performed when you manually query the sites. 如果对自动检索和查询没有任何限制,则需要阅读PHP的cURL模块,并模拟手动查询站点时执行的表单操作。 You can use your browser's developer console to see what scripts and pages are called when you run queries - it's quicker than trying to work it out from the page source. 您可以使用浏览器的开发人员控制台查看运行查询时调用的脚本和页面,这比尝试从页面源中查找脚本和页面要快。

You'll get back the HTML from the pages, which you'll need to parse. 您将从页面中获取HTML,您需要对其进行解析。 Depending on the format on the page, a few simple regexes might do the trick, but you'll likely need to tailor them for each site you query. 根据页面上的格式,可以使用一些简单的正则表达式来解决问题,但是您可能需要针对查询的每个站点定制它们。

Again, please double check that the sites you're using allow you to run scripted queries - if you're in any doubt, you should email them and explain what you plan to do, and ask if they're ok with it. 再次,请仔细检查您正在使用的网站是否允许您运行脚本化查询-如有任何疑问,应向他们发送电子邮件并说明您打算做什么,并询问他们是否可以接受。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM