简体   繁体   English

使用 for 循环和 forEach 从网站中提取 html 以获取 JSON 对象数组

[英]Using for loop and forEach to extract html from website for JSON array of objects

I am trying to generate a Javascript array of objects (to later be saved as JSON file) from the html of this web page: https://remoteok.io/remote-dev+javascript-jobs我正在尝试从此网页的 html 生成一个 Javascript 对象数组(稍后将保存为 JSON 文件): https : //remoteok.io/remote-dev+javascript-jobs

I want to extract the job descriptions for the first section (listed under "Today") - hidden beneath a dropdown until you click on each job listing.我想提取第一部分的工作描述(列在“今天”下) - 隐藏在下拉列表下,直到您单击每个工作列表。 See screenshot:看截图:

职位描述/职位列表解释器

The layout has been built with tables: The Job listing and Job description html containers are both sibling table rows <tr> .布局是用表格构建的:工作列表和工作描述 html 容器都是兄弟表格行<tr>

On the site, the first "listing" with the text "the first health insurance for remote startups" is promotional content so doesn't have hidden job description text when you click on it below.在该网站上,第一个带有“远程初创公司的第一份健康保险”字样的“列表”是促销内容,因此当您单击下面的内容时,没有隐藏的职位描述文本。

So in my code I start at index position 2 and then iterate by multiples of two in the for loop.所以在我的代码中,我从索引位置 2 开始,然后在 for 循环中以 2 的倍数进行迭代。 This part works.这部分有效。 But then I need to add this job description to each object in the array (called scrapedArray ) already created from other data:但随后我需要将此工作描述添加到已从其他数据创建的数组(称为scrapedArray )中的每个对象中:

const first_table_row = $(first_section).find('tr');
      for(var i = 2; i < first_table_row.length; i+= 2) {
        let job_description = $(first_table_row[i]).find('.markdown').html().trim();
        // console.log(job_description);

        scrapedArray.forEach((obj) => {
          obj["job_description"] = job_description;
        });
      }

Console logging job_description without the forEach gets each distinct job description as intended, but when I include the forEach it simply repeats the same html for the same Scalable Path job listing, see here (output of job_description truncated as html is quite long):没有 forEach 的控制台日志job_description按预期获取每个不同的作业描述,但是当我包含 forEach 时,它只是为相同的可伸缩路径作业列表重复相同的 html,请参见此处( job_description输出job_description截断为 html 很长):

(2) [{…}, {…}]
0:
company_logo: "https://remoteok.io/assets/jobs/07a835281c655f47e04cd5799f27d219.png?1584688805"
job_description: "\nScalable Path is looking for a Senior Full Stack.."
__proto__: Object
1:
company_logo: "https://remoteok.io/assets/jobs/9e96332ed226d8ffd20da84b6951b66e.png?1584649206"
job_description: "\nScalable Path is looking for a Senior Full Stack.."

What am I doing wrong?我究竟做错了什么? Is there a better way to do this?有一个更好的方法吗?

When you loop forEach inside your for loop, all job_description in your array will be assigned to one variable, so it will have the same value (of the last for loop).当您在for循环中循环forEach ,数组中的所有job_description将分配给一个变量,因此它将具有相同的值(与最后一个for循环相同)。 I've changed the for loop from 1 instead of 2, because you need company_logo, right?我已经将 for 循环从 1 改为 2,因为您需要 company_logo,对吗? Check the bellow code, I've checked and it works!检查波纹管代码,我已经检查过并且它有效!

let scrapedArray = [];
// Get the first tbody
let first_section = $('#jobsboard tbody')[0];

const first_table_row = $(first_section).find('tr');
for (var i = 1; i < first_table_row.length; i += 2) {
  let company_logo = $(first_table_row[i]).find('.logo').attr('src');
  let job_description = $(first_table_row[i + 1]).find('.markdown').html().trim();
  scrapedArray.push({
    company_logo, job_description
  });
}
console.log(scrapedArray);

Hope this helps!希望这可以帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM