简体   繁体   English

从网页检索信息

[英]Retrieving information from a web page

My application is meant to speed up the retrieval of phone call information from our telephone system. 我的应用程序旨在加快从我们的电话系统中检索电话信息的速度。 The best way to get this information is to create a new search on the telephone system's web interface and export the results to an Excel spreadsheet which my application then imports into a DataSet. 获取此信息的最佳方法是在电话系统的Web界面上进行新搜索,并将结果导出到Excel电子表格,然后我的应用程序将其导入到DataSet中。

To get the export, from the login screen, the process goes as follows: 要获取导出,请从登录屏幕进行如下操作:

  • Log in 登录
  • Navigate to Reports Page 导航到“报告”页面
  • Click "Extension Detail" link 点击“扩展详细信息”链接
  • Select "Extensions" CheckBox 选择“扩展名”复选框
  • Select the extensions (typically all the ones currently being used) from the listbox 从列表框中选择扩展名(通常是当前使用的所有扩展名)
  • Specify date range 指定日期范围
  • Click on Export button 点击导出按钮

It's not a big job to do it manually every day, but, for reliability, it would be great if I can make my application do this automatically the first time it starts every day. 每天手动执行此任务并不繁重,但是,出于可靠性考虑,如果我可以让我的应用程序在每天首次启动时自动执行此操作,那就太好了。 Since more than 1 person in the company is going to use this application, having a Windows Service do it would be even better. 由于公司中将有1个以上的人使用此应用程序,因此拥有Windows Service会更好。

I don't know if it'll help, but the system is Datatex Topaz Next Generation telephone management system: http://www.datatex.co.za/downloads/index.html#TNG 我不知道是否有帮助,但是系统是Datatex Topaz下一代电话管理系统: http://www.datatex.co.za/downloads/index.html#TNG ://www.datatex.co.za/downloads/index.html#TNG

Can anyone give me a basic idea how to do this? 谁能给我一个基本的想法该怎么做?

Also, can anyone post links (in comments if need be) to pages where I can learn more about how to do this? 另外,任何人都可以在我可以详细了解如何执行此操作的页面上张贴链接(如果需要,可以在评论中)?

I have done the something similar to fetch info from a website. 我做了类似从网站获取信息的操作。 I cannot give you a exact answer. 我不能给你确切的答案。 But the idea is to send login info to the page with form values. 但想法是使用表单值将登录信息发送到页面。 If the site is relying on cookies, you can use this cookie aware WebClient: 如果站点依赖于Cookie,则可以使用此支持cookie的WebClient:

public class CookieAwareWebClient : WebClient
{
    private CookieContainer cookieContainer = new CookieContainer();

    protected override WebRequest GetWebRequest(Uri address)
    {
        WebRequest request = base.GetWebRequest(address);
        if (request is HttpWebRequest)
        {
            (request as HttpWebRequest).CookieContainer = cookieContainer;
        }
        return request;
    }
}

You should be aware that some sites rely on a session id being passed so the first thing I did was to fetch the session id from the page: 您应该知道,有些站点依赖于传递的会话ID,所以我要做的第一件事是从页面中获取会话ID:

var client = new CookieAwareWebClient();
client.Encoding = Encoding.UTF8;

var indexHtml = client.DownloadString(*index page url*);

string sessionID = fetchSessionID(indexHtml);

Then I had to log in to the page which you can do by uploading values to the page. 然后,我必须登录到该页面,您可以通过将值上传到该页面来完成此操作。 You can see the specific form elements with "view source" but you have to know a little HTML to do so. 您可以使用“查看源代码”查看特定的表单元素,但是您必须了解一点HTML才能做到这一点。

var values = new NameValueCollection();
values.Add("sessionid", sessionID); //Fetched session id
values.Add("brugerid", args[0]); //Username in my case
values.Add("adgangskode", args[1]); //Password in my case
values.Add("login", "Login");   //The login button

//Logging in
client.UploadValues(*url to login*, values); //If all goes perfect, I'm logged in now

And then I could download the page I needed. 然后,我可以下载所需的页面。 In your case you may use DownloadFile(...) if the file always have the same url (something like Export.aspx?From=2010-10-10&To=2010-11-11) or UploadValues(...) where you specify the values as before but saves the result. 在您的情况下,如果文件始终具有相同的网址(例如Export.aspx?From = 2010-10-10&To = 2010-11-11)或UploadValues(...),则可以使用DownloadFile(...)像以前一样指定值,但保存结果。

string html = client.DownloadString(*url*);

It seems you have a lot more steps than I did. 看来您的步骤比我多得多。 But the principle is the same. 但是原理是一样的。 To see what values your send to the site to login etc. you can use programs such as Fiddler (windows) which can capture the activity going on. 要查看发送到站点以登录的值等,可以使用诸如Fiddler(windows)之类的程序来捕获正在进行的活动。 Essential you just do exactly the same thing but watch out for session id etc. which is temporary. 本质上,您只是做完全相同的事情,但要注意会话ID等,这是临时的。

The best idea is really to use some native way to fetch data, but if don't got the code, database etc. you have to do it the ugly way. 最好的想法实际上是使用某种本机方式来获取数据,但是如果没有代码,数据库等,则必须采用丑陋的方式。 You may also need a HTML parser to fetch the data ( ups, you don't because you export to a file ). 您可能还需要一个HTML解析器来获取数据( ups,不是因为您导出到文件了 )。 And last but not least, keep in mind that pages can change and there is great potential to fail to login, parse etc. 最后但并非最不重要的一点是,请记住页面可能会更改,并且很有可能无法登录,解析等。

Please ask for if you are uncertain what is going on. 请询问您是否不确定发生了什么。

ADDITION 加成

The CookieAwareWebClient is not my code: CookieAwareWebClient不是我的代码:

I also found some relevant threads: 我还发现了一些相关的线程:

With a HTTP client, you need to do the following: 对于HTTP客户端,您需要执行以下操作:

  • Log in, using cookies or HTTP authentication 使用Cookie或HTTP身份验证登录
  • Request a page 请求页面
  • Submit form data 提交表格数据

This means that you need some class or component in your program that can do HTTP, cookies, authentication and forms. 这意味着您需要程序中的某些类或组件来执行HTTP,Cookie,身份验证和表单。 With this, you do the same requests a user would do. 这样,您执行用户将执行的相同请求。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM