简体   繁体   English

从具有登录要求Java的网页获取数据

[英]Getting data from a webpage with login requirement Java

So recently I decided to teach myself how to get data from webpages. 因此,最近我决定自学如何从网页获取数据。 I managed to get data from JSON from a different webpage but when I try to copy everything from this website, it doesn't show the data I actually need. 我设法从另一个网页从JSON获取数据,但是当我尝试从该网站复制所有内容时,它没有显示我真正需要的数据。

The page I am trying is for example: http://www.tremorgames.com/index.php?action=shop&page=2 (You might need to register). 我正在尝试的页面例如: http : //www.tremorgames.com/index.php? action=shop&page =2 (您可能需要注册)。 The data I am trying to get is for example game name/ price or stock, if I can get one then I will be able to get all. 我要获取的数据例如是游戏名称/价格或股票,如果我可以获取一个,那么我将能够获取全部。

The problem is that Dev tools shows the code but when I try to copy everything to a file using Java, it doesn't show most part of the code. 问题是开发工具会显示代码,但是当我尝试使用Java将所有内容复制到文件时,它不会显示大部分代码。

(I tried with Jsoup as well and it doesn't work either). (我也尝试过使用Jsoup,它也不起作用)。 This is what I have for copying from webpages: 这是我要从网页复制的内容:

BufferedReader reader = null;
try {
    URL url = new URL("http://www.tremorgames.com/index.php?action=shop&page=2");
    reader = new BufferedReader(new InputStreamReader(url.openStream()));
    StringBuffer buffer = new StringBuffer();
    int read;
    char[] chars = new char[1024];
    while ((read = reader.read(chars)) != -1)
        buffer.append(chars, 0, read); 

    return buffer.toString();
} finally {
    if (reader != null)
        reader.close();
}

And as I said, I am trying to learn so any pointers are welcome(I've been searching for a while until I gave up and wrote the rest of the code). 正如我所说,我正在尝试学习以便欢迎使用任何指针(我一直在寻找一段时间,直到我放弃并编写了其余代码)。

Thanks in advance. 提前致谢。

Okay, so I finished this a while ago but forgot to answer my own question. 好的,我刚才完成了此操作,但忘了回答自己的问题了。 I used HtmlUnit for this because it looked like it was the most simple to do. 我之所以使用HtmlUnit是因为它看起来像是最简单的操作。

import com.gargoylesoftware.htmlunit.WebClient;  
import com.gargoylesoftware.htmlunit.html.HtmlInput;  
import com.gargoylesoftware.htmlunit.html.HtmlPage;  
import com.gargoylesoftware.htmlunit.html.HtmlSubmitInput;

In order to get data from that certain webpage, I needed to log into the website first. 为了从该网页获取数据,我需要先登录该网站。 For this I needed to start a web client. 为此,我需要启动一个Web客户端。 The thing to remember about this is the need to use the same web client so you will need initiate WebClient in the method that will call the login method(This method will also later send the WebClient to get data and anything else you might need). 记住这一点是需要使用相同的Web客户端,因此您将需要在将调用login方法的方法中启动WebClient(此方法稍后还将发送WebClient以获取数据以及您可能需要的其他任何内容)。

WebClient webClient = new WebClient(); //Initiate a WebClient variable.  
webClient = tremorLogin(webClient);

Then in tremorLogin I will log into the website and return the client back to webClient variable. 然后在tremorLogin中,我将登录到网站并将客户端返回到webClient变量。

//Login into Tremor Games and return the client(Saves the cookies).
private static WebClient tremorLogin(WebClient webClient) throws Exception
{
    webClient.getOptions().setJavaScriptEnabled(false);
    HtmlPage currentPage = webClient.getPage("http://www.tremorgames.com/"); //Load page at the STRING address.
    HtmlInput username = currentPage.getElementByName("loginuser"); //Find element called loginuser for username
    username.setValueAttribute(user); //Set value for username
    HtmlInput password = currentPage.getElementByName("loginpassword"); //Find element called loginpassword for password
    password.setValueAttribute(pass); //Set value for password
    HtmlSubmitInput submitBtn = currentPage.getElementByName("Submit"); //Find element called Submit to submit form.
    currentPage = submitBtn.click(); //Click on the button.

    return webClient;
}

The loginuser text is what the text field of the user name is called when you check the source code of the website. 当您检查网站的源代码时,loginuser文本就是用户名的文本字段。

HtmlInput username = currentPage.getElementByName("loginuser");

The loginpassword text is what the text field of the password is called when you check the source code of the website. 当您检查网站的源代码时,loginpassword文本就是密码的文本字段。

HtmlInput password = currentPage.getElementByName("loginpassword");

user is your username(String type) and pass is your password(String type) user是您的用户名(字符串类型),pass是您的密码(字符串类型)

username.setValueAttribute(user);  
password.setValueAttribute(pass);

After writing the username and password you will need to click on the submit button and for this you will need to find the name of the button in the website's source code(Same way as username and password text fields. After you have found the name of the button, you will need to click on it which is the second line. 输入用户名和密码后,您将需要单击提交按钮,为此,您将需要在网站的源代码中找到按钮的名称(与用户名和密码文本字段的方式相同。找到名称后,该按钮,则需要单击第二行。

 HtmlSubmitInput submitBtn = currentPage.getElementByName("Submit"); //Find element called Submit to submit form.
currentPage = submitBtn.click(); //Click on the button.

Once you return this, your web client is saved in the original method and later you can get all the data from there or anything else you might want to get from the website. 返回此值后,您的Web客户端将以原始方法保存,以后可以从那里获取所有数据,或者从网站获取其他任何数据。 In the original method you might have something like 在原始方法中,您可能会遇到类似

HtmlPage currentPage = webClient.getPage("http://www.tremorgames.com/index.php?action=shop&searchterm=steam&search_category=5&sort=price_asc&page=1");
String pageSource = currentPage.asXml();

After you have the website as xml in pageSource, you will have exactly the same text/code you see in developer tools and later you just need to search through it for the data you need. 在pageSource中将网站作为xml后,您将拥有在开发人员工具中看到的完全相同的文本/代码,以后您只需要在其中搜索所需的数据即可。

Hope this will help and save time for people. 希望这会帮助人们并节省时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM