简体   繁体   English

如何使用cheerio抓取具有不同类的元素?

[英]How do I grab elements having different classes using cheerio?

I am scraping bing search results using node and cheerio.我正在使用 node 和cheerio 抓取 bing 搜索结果。 I need to grab all the href values from two lists that have different IDs.我需要从具有不同 ID 的两个列表中获取所有 href 值。

  1. How can I grab all the tags from both these lists in one statement?如何在一个语句中从这两个列表中获取所有标签? I tried but it didn't work.我试过了,但没有用。
  2. From the first list, I do not want the li tags with the class "b_pag".从第一个列表中,我不想要带有“b_pag”类的 li 标签。 How can I write a selector for it?如何为它编写选择器? Like a Not condition or something.比如 Not 条件什么的。

$("a", ["#b_content", "#b_context"]).each((index, element) => { const href = $(element).attr("href"); links.push(href); });

Refer to the attached screenshot for the html.请参阅随附的 html 屏幕截图。 html html

Update2 : I was wanting to ignore the whole <li class="b_pag"> tag, but the solutions I found here and elsewhere ignored just that tag. Update2 :我想忽略整个<li class="b_pag">标签,但我在这里和其他地方找到的解决方案只忽略了那个标签。 Any other <li> tag under it, which has any other or no class, does not get ignored.它下面的任何其他<li>标签,如果有任何其他类或没有类,都不会被忽略。

I found a way around it.我找到了解决方法。 I could grab the <li> tags that have other class names.我可以获取具有其他类名的<li>标签。 Check out the html here .在这里查看html I am thinking of using four different selectors for the first four classes.我正在考虑为前四个类使用四个不同的选择器。 Like $(.b_algo) or $(.b_ans) .$(.b_algo)$(.b_ans) But how can I grab the other two <li> tags that have multiple classes associated with it?但是我怎样才能获取其他两个与多个类相关联的<li>标签呢? I could not get a clear idea from the cheerio docs .我无法从cheerio 文档中得到一个清晰的想法。 Hope I am clear enough for you guys!希望我对你们足够清楚! Something like $(.b_ans b_mop) didn't work.$(.b_ans b_mop)这样的东西不起作用。 Nor did $("li[class=b_ans b_mop") . $("li[class=b_ans b_mop")也没有。

Try this,尝试这个,

$("#b_content", "#b_context").each(function(i, elem) {
        array[i] = {
             a: $(this).find("a").attr("href")
         };
      });`

To select "li" except class "b_pag" use, li:not( .b_pag )要选择除“b_pag”类之外的“li”,请使用li:not( .b_pag )

Try this one试试这个

$(".b_content li[class!='b_pag']").find("a").each((index, element) => { const href = $(element).attr("href"); console.log(href); });

if you want to ignore the class use the attribute selector with respective tag like this li[class!='b_pag']如果您想忽略该类,请使用带有相应标记的属性选择器,例如li[class!='b_pag']

Try using Bing Web Search API instead: https://azure.microsoft.com/en-us/services/cognitive-services/bing-web-search-api/尝试使用 Bing Web Search API: https : //azure.microsoft.com/en-us/services/cognitive-services/bing-web-search-api/

It is the legal and better way to get Bing Search Results.这是获得必应搜索结果的合法且更好的方式。 You can sign up for free tier of this API, if you do not have lot of searches to do.如果您没有大量搜索要做,您可以注册此 API 的免费套餐。 You can also use the Azure free credit, that you receive when you join Azure.您还可以使用加入 Azure 时收到的 Azure 免费积分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM