简体   繁体   中英

How do I fill in a website form and retrieve the result in C#?

I would like my program to be able to access a website that processes string input and returns some information about it. I want to input two sequences, submit them and read the result through the program. The website is the following:

http://scansite.mit.edu/motifscan_seq.phtml

If you enter say 5031601 as Protein Name and DRNAYVWTLKGRTWKPTLVILRI as Sequence, you will be redirected to the results site. This is the site I want to be able to read with my program. I have researched a lot about this but I can't seem to get any useful solution.

Can anyone please help me out?


EDIT:

I tried to create a web request with the following code (adapted from the link):

        WebRequest request = WebRequest.Create(
                                   "http://scansite.mit.edu/motifscan_seq");
        request.Method = "POST";
        string postData = @"motif_option=all&protein_id=5031601&
                           sequence=DRNAYVWTLKGRTWKPTLVILRI&
                           stringency=High&submit=Submit Request";
        byte[] byteArray = Encoding.UTF8.GetBytes(postData);
        request.ContentType = "application/x-www-form-urlencoded";
        request.ContentLength = byteArray.Length;
        Stream dataStream = request.GetRequestStream();
        dataStream.Write(byteArray, 0, byteArray.Length);
        dataStream.Close();

        using (WebResponse response = request.GetResponse())
        using (Stream resSteam = response.GetResponseStream())
        using (StreamReader sr = new StreamReader(resSteam))
            File.WriteAllText("SearchResults.html", sr.ReadToEnd());
        System.Diagnostics.Process.Start("SearchResults.html");

When I open the SearchResults.html, it contains the original form site with the protein name entered. The sequence hasn't been entered (it is a textarea, not a textbox). And it hasn't been submitted. Is there anything I'm missing or doing wrong?


Resolved the issue by sending the request to the uri that is stated in the action attribute of the form tag (http://scansite.mit.edu/cgi-bin/motifscan_seq).

Your question's a bit vague, but what it sounds like you want to do is screen scraping . What it basically means is that you download the HTML of the page and parse it to grab the values that you want.

The site in question takes a POST request to the following URL:

http://scansite.mit.edu/cgi-bin/motifscan_seq

With the following parameters:

motif_option: all
protein_id:   5031601
sequence:     DRNAYVWTLKGRTWKPTLVILRI
stringency:   High
submit:       Submit Request

What you have to do is generate a POST request to the URL and pass in the same key/value pairs, except with your values instead. Here's some documentation on how to do that with C# (look at the example halfway down the page):

http://msdn.microsoft.com/en-us/library/debx8sh9.aspx

When you get the HTML back, you will need to parse it and find the relevant parts that you need. Unfortunately, there are no IDs or classes in the HTML and everything is made from tables, so this might be quite challenging. Here is another question that covers screen scraping in C#:

Screen Scraping HTML with C#

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM