[英]Finding number of occurrences on website
I'm making a program that can search a website for a targeted word. 我正在制作一个程序,可以在网站上搜索目标词。 I've been able to make it load the site, however i don't know how to make the method searchHits find and count its target.
我已经能够使其加载网站,但是我不知道如何使searchHits方法找到并计算其目标。 Help would be appreciated.
帮助将不胜感激。
public String[] searchHits(String target){
String[] out = new String[0];
}
public static void main(String[] args) throws IOException {
String AFTEN = "https://theguardian.com/";
String TARGET = "and";
I've also tried this without much luck: 我也尝试过这个没有太多运气:
public int searchHits(String target, String aften){
String[] out = new String[0];
int occurrences = 0;
if (aften.contains(target)) {
occurrences++;
}
return occurrences;
}
Hum... When looking for strings into an input text, you have to take care of these matters: 嗡嗡声...在输入文本中查找字符串时,您必须注意以下事项:
In the first case, you have to pre-process first the input text and convert it to lowercase, and (depending on the target language) even taking out the accent marks and convert it to plain, lowercase text. 在第一种情况下,您必须先对输入文本进行预处理,然后将其转换为小写,然后(取决于目标语言)甚至去除重音符号,然后将其转换为普通的小写文本。 And so the target text.
以此类推。
In the second case (complete words), you'll have also to tokenize first the input text into individual words , separating by commas, periods, colons, semicolons, etc. And also the target string. 在第二种情况下(完整的单词),您还必须首先将输入文本标记为单个单词 ,并用逗号,句点,冒号,分号等以及目标字符串分隔。 And then, iterate the full list of words looking for the target words.
然后,迭代单词的完整列表以查找目标单词。
If you want a simple approach, you should, at least, compare the text in a case-insensitive way. 如果您想要一种简单的方法,则至少应以不区分大小写的方式比较文本。 For this matter, you could use String.regionMatches instead of
contains
. 为此,您可以使用String.regionMatches而不是
contains
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.