I want to use Jsoup
to crawl a page that is only available when I signed in. I guess it means I need to sign in on one page and send cookies to another page.
I read some earlier post here and write the following code:
public static void main(String[] args) throws IOException {
Connection.Response res = Jsoup.connect("login.yahoo.com")
.data("login", "myusername", "passwd", "mypassword")
.method(Method.POST)
.execute();
Document doc=res.parse();
String sessionId = res.cookie("SESSIONID");
Document doc2 = Jsoup.connect("http://health.groups.yahoo.com/group/asthma/messages")
.cookie("SESSIONID", sessionId)
.get();
Elements Eles=doc2.getElementsByClass("message");
String content=Eles.first().text();
System.out.println(content);
My question is how I can know my cookie name (ie "SESSIONID") here for sending my login info? I used the .cookies()
method to get all the cookies from the login page:
B
DK
YM
T
PH
Y
F
I tried them one by one but none worked. I could get sessionId from some of them, but I could not successfully get nodes from the second page, which means I didn't successfully sign in. Could anyone give me some suggestions? Many thanks!
Ive struggled with logging in to websites with jsoup also.
What i came up with was a hybrid of selenium webdriver, and jsoup.
Webdriver can remote control a browser, typically this is used for testing purposes.
For my application, it was not desirable to have the browser visible, and messing about on the screen. So I have used the "silent" webdriver: HtmlUnitDriver instead. You can instantiate this using this line of code:
HtmlUnitDriver driver = new HtmlUnitDriver(true); // true meaning javascript support (Using rhino i be leave)
Now to login to a website i use:
String baseUrl = "http://www.thesite.com";
driver.manage().timeouts().implicitlyWait(30, TimeUnit.SECONDS);
driver.get(baseUrl);
driver.findElement(By.id("TextBoxUser")).clear();
driver.findElement(By.id("TextBoxUser")).sendKeys("username");
driver.findElement(By.id("TextBoxPass")).clear();
driver.findElement(By.id("TextBoxPass")).sendKeys("password");
driver.findElement(By.id("Button1")).click();
Get the page content:
String htmlContent = driver.getPageSource();
Start using jsoup:
Document document = Jsoup.parse(htmlContent);
This has worked great for me.
Steffn Otto Jensen
Have you tried to do something like this:
Connection.Response res = Jsoup.connect("https://login.yahoo.com/config/login?")
.data("login", "myusername", "passwd", "mypassword")
.method(Method.POST)
.execute();
Map<String, String> cookies = res.cookies();
Connection connection = Jsoup.connect("http://health.groups.yahoo.com/group/asthma/messages");
for (Map.Entry<String, String> cookie : cookies.entrySet()) {
connection.cookie(cookie.getKey(), cookie.getValue());
}
Document doc= connection.get();
// #code selector
// Example
// Element e=doc.select(".ygrp-grdescr").first();
// System.out.println(e.text()); // Print => This list will be for asthmatics, and anyone whose life is affected by it. Discussions include causes, problems, and treatment
I hope you this works for your problem.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.