在正文中搜索关键字

Question

I use an ajax-call to get the html from an external url: 我使用ajax调用从外部URL获取html：

 var uriData = $.ajax({ url: 'http://www.example.com', success: function(data) { alert(data); } });

That works fine. 很好 I get an alert with html of an external website. 我收到外部网站的html警报。

Is there a easy way to search for a keyword and count the number of it in the Text or in the headlines of the html-body? 有没有一种简单的方法来搜索关键字并在“文本”或html正文的标题中计算关键字的数量？

I tried it this way: 我这样尝试过：

HTML 的HTML

<input id="url" type="text" name="url">
<input id="keyword" type="text" name="keyword">

SCRIPT 脚本

 function keyWords() {
   var website = jQuery('#url').val(); 
   var keyword = jQuery('#keyword').val(); 

   jQuery.ajax({ url: website, success: function(data) { 
   var sumKeyword = data.split(keyword).length - 1;    
   alert (sumKeyword); } 
   });
 };

But unfortunately he then searches the keyword in the entire HTML. 但不幸的是，他随后在整个HTML中搜索了关键字。 (for eg also in achor text etc.) Finally, I want to get the number of keywords in headlines and in the text (p, span, etc.) （例如，也用于achor文本等。）最后，我想获取标题和文本中的关键字数（p，span等）。

Answer 1

here's a chunk of code that might inspire you to solve your problem 这是一段可能激发您解决问题的代码

   var data='<span id="url" type="text" name="url">test this test</span>';
    var message = $('<div/>').append(data).find("span:contains('test')").each(function(){

       var sumKeyword = $(this).text().split("test").length - 1;  
       alert (sumKeyword); 

    });

jsfiddle jsfiddle

Answer 2

You could do this with the match function: 您可以使用match函数执行此操作：

jQuery.ajax({ url: website, success: function(data) {
   var body = data.match(/<body>(.*)<\/body>/)[1];
   var sumKeyword = body.split(keyword).length - 1;    
   alert (sumKeyword); } 
});

Answer 3

getting the textual out of the boilerplate of an HTML is a common task tackled by multiple external APIs and libraries. 从HTML的样板中提取文本是多个外部API和库解决的常见任务。 You cannot simply get the textual content by checking all the text in a webpage as you will end up with lots of irrelevant advertisements and so on. 您不能简单地通过检查网页中的所有文本来获取文本内容，因为最终会出现许多不相关的广告，依此类推。 Using libraries like Diffbot can identify the title/header and the body and suggest some tags and keywords. 使用Diffbot之类的库可以识别标题/标题和正文，并建议一些标签和关键字。 Afterwards you can do your analysis on the extracted text. 之后，您可以对提取的文本进行分析。

External APIs 外部API

在正文中搜索关键字

问题描述

3 个解决方案

解决方案1
2 已采纳 2014-09-01 10:30:17

解决方案2
1 2014-09-01 10:00:16

解决方案3
1 2014-09-01 10:05:12

在正文中搜索关键字

问题描述

3 个解决方案

解决方案1 2 已采纳 2014-09-01 10:30:17

解决方案2 1 2014-09-01 10:00:16

解决方案3 1 2014-09-01 10:05:12

解决方案1
2 已采纳 2014-09-01 10:30:17

解决方案2
1 2014-09-01 10:00:16

解决方案3
1 2014-09-01 10:05:12