簡體   English   中英

nodejs 暫停 sax 流

[英]nodejs pause sax stream

我正在使用sax解析超大的 XML 文件。 我正在從我的 XML 文件創建一個 readStream 並將它的管道傳輸到 sax 中,如下所示:

this.sourceStream = fs.createReadStream(file);
this.sourceStream
    .pipe(this.saxStream);

我正在聽一些這樣的事件:

this.saxStream.on("error", (err) => {
    logger.error(`Error during XML Parsing`, err);
});
this.saxStream.on("opentag", (node) => {
    // doing some stuff
});
this.saxStream.on("text", (t) => {
    // doing some stuff
});
this.saxStream.on("closetag", () => {
    if( this.current_element.parent === null ) {
        this.sourceStream.pause();
        this.process_company_information(this.current_company, (err) => {
            if( err ) {
                logger.error("An error appeared while parsing company", err);
            }
            this.sourceStream.resume();
        });
    }
    else {
        this.current_element = this.current_element.parent;
    }
});
this.saxStream.on("end", () => {
    logger.info("Finished reading through stream");
});

在特定的結束標記進入 sax 流后,流需要暫停,需要處理當前元素,然后流才能繼續。 正如你可以在我的代碼中看到我試圖暫停sourceStream但是我發現一個暫停將readStream不工作,如果是管道。

所以我的一般問題是如何讓 sax 解析器暫停,直到處理當前解析的元素?

我讀過關於取消管道和暫停然后再次管道並恢復的內容,這真的是這樣做的方法嗎,它也可靠嗎?

為了更好地說明,這里有一些日志:

debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: Done with root tag, can continue stream
debug: Done with root tag, can continue stream
debug: Done with root tag, can continue stream
debug: Done with root tag, can continue stream

我真正想要的是這樣的日志:

debug: New root tag found
debug: Done with root tag, can continue stream
debug: New root tag found
debug: Done with root tag, can continue stream
debug: New root tag found
debug: Done with root tag, can continue stream
debug: New root tag found
debug: Done with root tag, can continue stream
debug: New root tag found

在當前狀態下,sax 比處理器快得多,因此不暫停流將導致內存問題。

sax目前沒有得到積極維護( https://github.com/isaacs/sax-js/issues/238 )。 我建議您遷移到另一個解析器。 例如saxes https://github.com/lddubeau/saxes

您可以使用帶有GeneratorIterablefor-await-of構造,而不是暫停/恢復流( https://nodejs.org/api/stream.html#stream_sumption_readable_streams_with_async_iterators )。

安裝 deps: yarn add emittery saxesnpm install emittery saxes

然后做這樣的事情:

import {createReadStream} from 'fs';
import {SaxesParser, SaxesTagPlain} from 'saxes';
import Emittery from 'emittery';

export interface SaxesEvent {
  type: 'opentag' | 'text' | 'closetag' | 'end';
  tag?: SaxesTagPlain;
  text?: string;
}

/**
  * Generator method.
  * Parses one chunk of the iterable input (Readable stream in the string data reading mode).
  * @see https://nodejs.org/api/stream.html#stream_event_data
  * @param iterable Iterable or Readable stream in the string data reading mode.
  * @returns Array of SaxesParser events
  * @throws Error if a SaxesParser error event was emitted.
  */
async function *parseChunk(iterable: Iterable<string> | Readable): AsyncGenerator<SaxesEvent[], void, undefined> {
  const saxesParser = new SaxesParser<{}>();
  let error;
  saxesParser.on('error', _error => {
    error = _error;
  });

  // As a performance optimization, we gather all events instead of passing
  // them one by one, which would cause each event to go through the event queue
  let events: SaxesEvent[] = [];
  saxesParser.on('opentag', tag => {
    events.push({
      type: 'opentag',
      tag
    });
  });

  saxesParser.on('text', text => {
    events.push({
      type: 'text',
      text
    });
  });

  saxesParser.on('closetag', tag => {
    events.push({
      type: 'closetag',
      tag
    });
  });

  for await (const chunk of iterable) {
    saxesParser.write(chunk as string);
    if (error) {
      throw error;
    }

    yield events;
    events = [];
  }

  yield [{
    type: 'end'
  }];
}

const eventEmitter = new Emittery();
eventEmitter.on('text', async (text) => {
  console.log('Start');
  await new Promise<void>(async (resolve) => {
    await new Promise<void>((resolve1) => {
      console.log('First Level Promise End');
      resolve1();
    });
    console.log('Second Level Promise End');
    resolve();
  });
});

const readable = createReadStream('./some-file.xml');
// Enable string reading mode
readable.setEncoding('utf8');
// Read stream chunks
for await (const saxesEvents of parseChunk(iterable) ?? []) {
  // Process batch of events
  for (const saxesEvent of saxesEvents ?? []) {
    // Emit ordered events and process them in the event handlers strictly one-by-one
    // See https://github.com/sindresorhus/emittery#emitserialeventname-data
    await eventEmitter.emitSerial(event.type, event.tag || event.text);
  }
}

也看看這個解決方案的主要討論https://github.com/lddubeau/saxes/issues/32

對於將來遇到類似問題的任何人,以下是我嘗試如何最終使其工作的其他方法,即使它在某種程度上是一種解決方法。

我試圖在嘗試pause-streampass-stream之間管道暫停流,因為它們應該在暫停時緩沖。 出於某種原因,這再次根本沒有改變行為。

最后,我決定從根本上解決問題,而不是創建一個 ReadingStream 並將其通過管道傳輸到 sax 中,我使用逐行從 XML 批量讀取行並寫入 sax 解析器。 現在可以正確暫停此行讀取過程並最終幫助我實現所需的行為

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM