简体   繁体   中英

how do I search a word in a webpage

how do I search for existence of a word in a webpage given its url say "www.microsoft.com". Do I need to download this webpage to perform this search ?

You just need to make http request on web page and grab all its content after that you can search necessary words in it, below code might help you to do so.

 public static void main(String[] args) {
    try {
        URL url;
        URLConnection urlConnection;
        DataOutputStream outStream;
        DataInputStream inStream;

        // Build request body
        String body =
        "fName=" + URLEncoder.encode("Atli", "UTF-8") +
        "&lName=" + URLEncoder.encode("Þór", "UTF-8");

        // Create connection
        url = new URL("http://www.example.com");
        urlConnection = url.openConnection();
        ((HttpURLConnection)urlConnection).setRequestMethod("POST");
        urlConnection.setDoInput(true);
        urlConnection.setDoOutput(true);
        urlConnection.setUseCaches(false);
        urlConnection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
        urlConnection.setRequestProperty("Content-Length", ""+ body.length());

        // Create I/O streams
        outStream = new DataOutputStream(urlConnection.getOutputStream());
        inStream = new DataInputStream(urlConnection.getInputStream());

        // Send request
        outStream.writeBytes(body);
        outStream.flush();
        outStream.close();

        // Get Response
        // - For debugging purposes only!
        String buffer;
        while((buffer = inStream.readLine()) != null) {
            System.out.println(buffer);
        }

        // Close I/O streams
        inStream.close();
        outStream.close();
    }
    catch(Exception ex) {
        System.out.println("Exception cought:\n"+ ex.toString());
    }
}

我知道我在理论上将如何做-使用cURL或某些应用程序下载它,将内容存储到变量中,然后根据需要解析它

Yes, you need to download page content and search inside it for what you want. And if it happens that you want to search the whole microsoft.com website then you should either write your own web crawler, use an existing crawler or use some search engine API like Google's.

Yes, you'll have to download the page, and, to make sure to get the complete content, you'll want to execute scripts and include dynamic content - just like a browser.

We can't "search" something on a remote resource, that is not controlled by us and no webservers offers a "scan my content" method by default.

Most probably you'll want to load the page with a browser engine (webkit or something else) and perform the search on the internal DOM structure of that engine.

If you want to do the search yourself, then obviously you have to download the page. If you're planning on this approach, i recommend Lucene (unless you want a simple substring search)

Or you could have a webservice that does it for you. You could request the webservice to grep the url and post back its results.

You could use a search engine's API. I believe Google and Bing ( http://msdn.microsoft.com/en-us/library/dd251056.aspx ) have ones you can use.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM