简体   繁体   中英

How to programmatically verify login credentials for a web form?

I'm building an app to let users export data from a university system. Currently, they can log in and see the data in HTML, but I would like to let people download it as CSV.

I have an app where users supply their username and password. I would like to log in to the university system and HTML scrape the resulting page. How can I do this?

I'm building a GWT app. I could either do this in Java-transliterated-JS on the client, or Java on the server.

Update : Selenium might be nice, but it looks like overkill.

You're going to have to do this from the server unless the domains are the same. You'd need to determine what the POST transaction used by the other server for the login step looks like - parameter names etc. Then you'd perform that operation and do whatever you want with what comes back. If you need to see multiple pages, you need to maintain the appropriate session cookie too so that the server knows you're still logged in on the subsequent HTTP requests.

If you have to hit another site to validate the credentials, then I'm not so sure that people should feel comfortable providing those credentials to you. That is, if you don't have rights to check the credentials directly, why are you trustworthy to receive them? I know sometimes people need to integrate with a system they don't own, so this is just a question.

First, this has to be done server-side because of the limitations on client scripting due to the same origin policy .

The typical way of handling the "screen scraping" you mention is to treat the web page as if it was an XML service. First, examine the source code of the page, then using an internet/HTTP stack, craft a POST to the correct URL and read the response using a standard XML library. It will take some ingenuity to come up with a good way to dig into the XML to find the piece you need that will be as insulated as possible from changes to the page. Keep in mind that your system can break any time that the owners of the site change their page.

Sometimes, you can't just send the POST but have to request the blank page initially in order to get hidden form values that need to be returned in the POST. You'll have to experiment to find out what it requires.

Additionally, you probably have to handle cookies as well, since they usually are an integral part of the web site's authentication and session management (though you might get lucky that the session doesn't matter between the initial POST and the first response).

Last, you may be unlucky enough that the site uses javascript to do part of the authentication work, which may require additional digging to understand how the credentials are posted to the site.

There are other potential barriers such as the site checking to see that the referrer is their own site, possible use of SSL (HTTPS) and so on.

I'm pretty sure that the protection against cross-site scripting in web browsers will mean that you can't log in to the university's app using javascript running in the web browser. So the part of your program that fetches data from the university will need to run on your server. Once you have the data, you can process it either on your server or in javascript in the browser, but I think it would be easier to do it on the server.

See http://en.wikipedia.org/wiki/Same_origin_policy

I'm not too sure about GWT, but in general, you would take the form data submitted by the user, check it against a database of username and hashed passwords. If the database checks out, set a session cookie that says the user is logged in.

In your pages, check if the session cookie say the user is logged in. If not, redirect to login page, otherwise allow them to view the pagfe.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM