简体   繁体   English

使用Java获取页面源仅读取第一行

[英]Get page source using Java is reading only 1st line

I have never used java in my life,But i am very good in php , I want to get page source of an website.But i am using Appspot(GAE) .Where file_get_contents and Curl is not working.So i want to get page source via java.I learnt some basics of java and found below code , But below code is getting only 1st line of external page.Please guide me where i am wrong. 我一生中从未使用过Java,但是我非常擅长php,我想获取网站的页面源。但是我正在使用Appspot(GAE),其中file_get_contents和Curl无法正常工作。所以我想获取页面通过java的源代码。我学习了Java的一些基础知识,并在下面的代码中找到了,但是下面的代码仅获得外部页面的第一行。

<?php

function get($url){

        import java.net.URL;
        import java.io.BufferedReader;
        import java.io.InputStreamReader;

        $java_url = new URL($url);
        $java_bufferreader = new BufferedReader(new InputStreamReader($java_url->openStream()));

        while (($line = $java_bufferreader->readLine()) != null) {
            $content .= $line;
        }

        return $content;
}


echo get("http://domain.com");

?>

For example , if i scrape stackoverflow.com its returning only below code 例如,如果我刮stackoverflow.com它只返回下面的代码

<!DOCTYPE html><html><head>        <title>Stack Overflow</title>    <link rel="shortcut icon" href="//cdn.sstatic.net/stackoverflow/img/favicon.ico">    <link rel="apple-touch-icon image_src" href="//cdn.sstatic.net/stackoverflow/img/apple-touch-icon.png">    <link rel="search" type="application/opensearchdescription+xml" title="Stack Overflow" href="/opensearch.xml">    <meta name="twitter:card" content="summary">    <meta name="twitter:domain" content="stackoverflow.com"/>    <meta name="og:type" content="website" />    <meta name="og:image" content="http://cdn.sstatic.net/stackoverflow/img/apple-touch-icon@2.png?v=fde65a5a78c6"/>    <meta name="og:title" content="Stack Overflow" />    <meta name="og:description" content="Q&amp;A for professional and enthusiast programmers" />    <meta name="og:url" content="http://stackoverflow.com/"/>

Try with the Scanner class. 尝试使用Scanner类。

<?php

function get($url){

        import java.net.URL;
        import java.util.Scanner;

        $java_url = new URL($url);
        $java_scanner = new Scanner($java_url->openStream());

        while (($line = $java_scanner->nextLine()) != null) {
            $content .= $line;
        }

        return $content;
}


echo get("http://domain.com");

?>

If that does not work either, initialize variable content with an empty string, just in case. 如果还是不行,请以空字符串初始化变量内容 ,以防万一。 :) :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 扫描程序仅读取字符串的第一个标记 - Scanner only reading 1st token of a string 如何通过提及关键字仅在第一次出现时替换特定行 - How to replace specific line only on 1st occurence by mentioning keyword 从堆栈跟踪的第一行调用方法:“ MyClass.java:1” - calling method from 1st line in stacktrace: “MyClass.java:1” Java Bubblesort只交换第一个数组项 - Java Bubblesort is only swapping 1st array item Java Iterator无限循环仅迭代hashmap中的第1项 - Java Iterator infinite loop iterates only the 1st item in hashmap 如何在JSON中只获取大数组的第一个元素? - How to get only the 1st element of an large array in JSON? 仅将 sim 卡插入第一个插槽时如何获取 otp? - how to get otp when sim is inserted in 1st slot only? java jodatime在月底后无法回到第一 - java jodatime cannot get back to 1st after the end of month 使用java流查找第一个免费“索引” - Finding 1st free “index” using java streams 如何通过传递搜索字符串在浏览器中显示pdf文件,并且需要使用java显示包含突出显示的搜索字符串的第1页 - How to display a pdf file in browser by passing a search string and need to display that page 1st with contains highlighted search string using java
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM