[英]Log into php-based site and scraping data - problems
I am creating a 3rd party java application (desktop) that needs to connect to a php-based site and log in to gather pertinent data. 我正在创建一个第三方Java应用程序(桌面),该应用程序需要连接到基于php的站点并登录以收集相关数据。 There is no accessible web service, no API, and every user will have their own secure login.
没有可访问的Web服务,没有API,并且每个用户都将拥有自己的安全登录名。 The site uses dojo (if that matters), and I am using Java HttpClient to send the post.
该网站使用dojo(如果有的话),而我正在使用Java HttpClient发送帖子。
HttpPost httppost = new HttpPost("https://thewebsite.net/index/login"); // .php ?
// Create a new HttpClient and Post Header
HttpClient httpclient = new DefaultHttpClient();
//initialize the response string
String nextpage = "";
try {
// Add nvps
List<NameValuePair> nameValuePairs = new ArrayList<NameValuePair>(3);
nameValuePairs.add(new BasicNameValuePair("", ""));
nameValuePairs.add(new BasicNameValuePair("login", "USER"));
nameValuePairs.add(new BasicNameValuePair("", ""));
nameValuePairs.add(new BasicNameValuePair("pass", "PASSWORD"));
nameValuePairs.add(new BasicNameValuePair("Submit", ""));
httppost.setEntity(new UrlEncodedFormEntity(nameValuePairs));
HttpResponse response = httpclient.execute(httppost);
userID = EntityUtils.toString(response.getEntity());
System.out.println(nextpage);
httppost.releaseConnection();
}
...
Now, the issue I'm having is that the response given to me is a validation jscript for the user / pass fields through dojo. 现在,我遇到的问题是,给我的响应是通过dojo的用户/传递字段的验证jscript。
<script type='text/javascript'>
dojo.require("dojox.validate._base");
function validate_RepeatPassword(val, constraints)
{
var isValid = false;
if(constraints) {
var otherInput = dijit.byId(constraints[0]);
if(otherInput) {
var otherValue = otherInput.value;
isValid = (val == otherValue);
}
}
return isValid;
}
</script>
I simply want to connect, parse an html response, and close the connection. 我只想连接,解析html响应并关闭连接。
When I use firebug, I get this as the post method, but I can't seem to get it to run: Referer https://thewebsite.net/index/login Source login=USER&pass=PASSWORD 当我使用firebug时,我将其作为post方法,但似乎无法使其运行:引用https://thewebsite.net/index/login源login = USER&pass = PASSWORD
When I use the HttpPost client to construct a direct post url without namevaluepairs: 当我使用HttpPost客户端构建不带namevaluepairs的直接发布url时:
HttpPost httppost = new HttpPost("https://thewebsite.net/index/login?login=USER&pass=PASSWORD");
, I get an error response that states "the user and pass fields cannot be left blank." ,我收到一个错误响应,指出“用户和密码字段不能留为空白”。
My question is: Is there a direct method to log in that is simpler that I'm missing that will allow me to successfully continue past log in? 我的问题是:是否有一种直接的登录方法可以使我成功地继续过去的登录,而这种方法更容易丢失?
Thanks - I love the SO community; 谢谢-我爱SO社区; hope you can help.
希望能对您有所帮助。
I think best library for doing this is jsoup 我认为最好的库是jsoup
Connection.Response res =
Jsoup.connect("https://thewebsite.net/index/login?login=USER&pass=PASSWORD")
.method(Method.POST)
.execute();
After this you need to make verification also. 此后,您还需要进行验证。 You need to read cookies, request parameters and header parameters and this will work.
您需要读取cookie,请求参数和标头参数,这将起作用。
I didn't end up using your exact code (with the post parameters), but JSoup was the fix. 我最终并没有使用您的确切代码(带有post参数),但是JSoup是解决方案。
here's what I used: 这是我使用的:
`res = Jsoup.connect("https://thewebsite.net/index/login")
.data("login", User).data("pass", Pass)
.userAgent("Chrome").method(Method.POST).execute();
//then I grabbed the cookie and sent the next post for the data
Document t = res.parse(); //for later use
SessionID = res.cookie("UNIQUE_NAME");
//the JSON
Connection.Response driverx = Jsoup.connect("https://thewebsite.net/datarequest/data").cookie("UNIQUE_NAME",SessionID).userAgent("Chrome").method(Method.POST).execute();`
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.