简体   繁体   English

如何让 node.js 网络爬虫定期检查数据更新的端点?

[英]How to make a node.js webscraper periodically check an endpoint for data updates?

I am writing a discord bot which aggregates data from an third-party API.我正在编写一个不和谐的机器人,它聚合来自第三方 API 的数据。

There is a design pattern from discord.js which I want to follow for my web-scraping functions, wherein one instantiates a client object, and performs actions when the client emits specific events, like so:我想为我的网络抓取功能遵循discord.js一种设计模式,其中实例化一个客户端对象,并在客户端发出特定事件时执行操作,如下所示:

const Discord = require('discord.js');
const client = new Discord.Client();

client.on('ready', () => {
  console.log(`Logged in as ${client.user.tag}!`);
});

client.on('message', msg => {
  if (msg.content === 'ping') {
    msg.reply('Pong!');
  }
});

client.login('token');

To my understanding this code will run indefinitely, performing actions each time a specific event is emitted, eg ready or message .据我了解,这段代码将无限期运行,每次发出特定事件时执行操作,例如readymessage

I cannot find out how such functionality is implemented.我不知道如何实现这样的功能。 More specifically, I can't figure out how the discord client object continually looks for changes, and emits an event when it notices them.更具体地说,我无法弄清楚 discord client对象如何不断地寻找变化,并在注意到它们时发出一个事件。

The reason I want to emulate this design pattern is so that I can run one node.js application which will, say every 10 minutes, reach out to the API and see if there is new information, and log it into a database when there are changes.我想模仿这种设计模式的原因是,我可以运行一个 node.js 应用程序,该应用程序将每隔 10 分钟访问 API 并查看是否有新信息,并在有新信息时将其登录到数据库中变化。

My initial thought is something along these lines, but it blows up the callstack with an out of memory error.我最初的想法是这样的,但它会因内存不足错误而炸毁调用堆栈。

const events = require("events");

class ScrapeEmitter extends events.EventEmitter {}
const scrapeEmitter = new ScrapeEmitter();

scrapeEmitter.on("timeExpired", () => console.log("call scraping code here"));

while (true) {
  setTimeout(() => scrapeEmitter.emit("timeExpired"), 1500);
}

The end goal is to, from index.js , write the following, and have it both listen for discord events, while also scraping for data.最终目标是从index.js编写以下内容,并让它既侦听不和谐事件,同时也抓取数据。

import * as scraper from "./core/scraper";
const Discord = require('discord.js');
const client = new Discord.Client();

client.on('ready', () => {
  console.log(`Logged in as ${client.user.tag}!`);
});

client.on('message', msg => {
  if (msg.content === 'ping') {
    msg.reply('Pong!');
  }
});

client.login('token');
scraper.begin_scraping();

This portion of code这部分代码

while (true) {
  setTimeout(() => scrapeEmitter.emit("timeExpired"), 1500);
}

creates an infinite amount of timeouts.创建无限数量的超时。 What you need to do is only start a timeout after the previous one has completed .您需要做的只是在前一个完成后开始超时 An example is:一个例子是:

function loop() {
setTimeout(loop, 1500);
}

This calls the function after 1500 seconds, which in turn calls the function after 1500 seconds, and so on.这将在 1500 秒后调用该函数,然后在 1500 秒后调用该函数,依此类推。

However, the better solution is to use setInterval() .但是,更好的解决方案是使用setInterval() It looks like this:它看起来像这样:

function loop() {};
setInterval(loop, 1500);

So, instead of writing所以,而不是写

while (true) {
  setTimeout(() => scrapeEmitter.emit("timeExpired"), 1500);
}

Write

setInterval(() => scrapeEmitter.emit("timeExpired"), 1500);

This removes the infinite loop and acts as expected.这将消除无限循环并按预期运行。

I'm just translating @Worthy Alpaca's answer into a comment.我只是将@Worthy Alpaca 的回答翻译成评论。 It's a community wiki, so I get no reputation这是一个社区维基,所以我没有声誉

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM