简体   繁体   English

在网站上查找次数

[英]Finding number of occurrences on website

I'm making a program that can search a website for a targeted word. 我正在制作一个程序,可以在网站上搜索目标词。 I've been able to make it load the site, however i don't know how to make the method searchHits find and count its target. 我已经能够使其加载网站,但是我不知道如何使searchHits方法找到并计算其目标。 Help would be appreciated. 帮助将不胜感激。

public String[] searchHits(String target){
        String[] out = new String[0];
            }

public static void main(String[] args) throws IOException {
    String AFTEN = "https://theguardian.com/";
    String TARGET = "and";

I've also tried this without much luck: 我也尝试过这个没有太多运气:

 public int searchHits(String target, String aften){ 
    String[] out = new String[0];
    int occurrences = 0;

    if (aften.contains(target)) {
       occurrences++;
    }
    return occurrences;
    }

Hum... When looking for strings into an input text, you have to take care of these matters: 嗡嗡声...在输入文本中查找字符串时,您必须注意以下事项:

  1. If the target string has to be matched literally, or shall be matched no matter upper or lowercase or some other idiomatic symbols (accent marks, etc). 如果目标字符串必须从字面上进行匹配,或者无论大小写或其他惯用符号(重音符号等)都应匹配。
  2. If the target string is complete, or might be a part of a word. 目标字符串是否完整,或者可能是单词的一部分。

In the first case, you have to pre-process first the input text and convert it to lowercase, and (depending on the target language) even taking out the accent marks and convert it to plain, lowercase text. 在第一种情况下,您必须先对输入文本进行预处理,然后将其转换为小写,然后(取决于目标语言)甚至去除重音符号,然后将其转换为普通的小写文本。 And so the target text. 以此类推。

In the second case (complete words), you'll have also to tokenize first the input text into individual words , separating by commas, periods, colons, semicolons, etc. And also the target string. 在第二种情况下(完整的单词),您还必须首先将输入文本标记为单个单词 ,并用逗号,句点,冒号,分号等以及目标字符串分隔。 And then, iterate the full list of words looking for the target words. 然后,迭代单词的完整列表以查找目标单词。

If you want a simple approach, you should, at least, compare the text in a case-insensitive way. 如果您想要一种简单的方法,则至少应以不区分大小写的方式比较文本。 For this matter, you could use String.regionMatches instead of contains . 为此,您可以使用String.regionMatches而不是contains

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM