[英]nodejs pause sax stream
我正在使用sax解析超大的 XML 文件。 我正在從我的 XML 文件創建一個 readStream 並將它的管道傳輸到 sax 中,如下所示:
this.sourceStream = fs.createReadStream(file);
this.sourceStream
.pipe(this.saxStream);
我正在聽一些這樣的事件:
this.saxStream.on("error", (err) => {
logger.error(`Error during XML Parsing`, err);
});
this.saxStream.on("opentag", (node) => {
// doing some stuff
});
this.saxStream.on("text", (t) => {
// doing some stuff
});
this.saxStream.on("closetag", () => {
if( this.current_element.parent === null ) {
this.sourceStream.pause();
this.process_company_information(this.current_company, (err) => {
if( err ) {
logger.error("An error appeared while parsing company", err);
}
this.sourceStream.resume();
});
}
else {
this.current_element = this.current_element.parent;
}
});
this.saxStream.on("end", () => {
logger.info("Finished reading through stream");
});
在特定的結束標記進入 sax 流后,流需要暫停,需要處理當前元素,然后流才能繼續。 正如你可以在我的代碼中看到我試圖暫停sourceStream
但是我發現一個暫停將readStream不工作,如果是管道。
所以我的一般問題是如何讓 sax 解析器暫停,直到處理當前解析的元素?
我讀過關於取消管道和暫停然后再次管道並恢復的內容,這真的是這樣做的方法嗎,它也可靠嗎?
為了更好地說明,這里有一些日志:
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: New root tag found
debug: Done with root tag, can continue stream
debug: Done with root tag, can continue stream
debug: Done with root tag, can continue stream
debug: Done with root tag, can continue stream
我真正想要的是這樣的日志:
debug: New root tag found
debug: Done with root tag, can continue stream
debug: New root tag found
debug: Done with root tag, can continue stream
debug: New root tag found
debug: Done with root tag, can continue stream
debug: New root tag found
debug: Done with root tag, can continue stream
debug: New root tag found
在當前狀態下,sax 比處理器快得多,因此不暫停流將導致內存問題。
sax
目前沒有得到積極維護( https://github.com/isaacs/sax-js/issues/238 )。 我建議您遷移到另一個解析器。 例如saxes
https://github.com/lddubeau/saxes 。
您可以使用帶有Generator
和Iterable
的for-await-of
構造,而不是暫停/恢復流( https://nodejs.org/api/stream.html#stream_sumption_readable_streams_with_async_iterators )。
安裝 deps: yarn add emittery saxes
或npm install emittery saxes
然后做這樣的事情:
import {createReadStream} from 'fs';
import {SaxesParser, SaxesTagPlain} from 'saxes';
import Emittery from 'emittery';
export interface SaxesEvent {
type: 'opentag' | 'text' | 'closetag' | 'end';
tag?: SaxesTagPlain;
text?: string;
}
/**
* Generator method.
* Parses one chunk of the iterable input (Readable stream in the string data reading mode).
* @see https://nodejs.org/api/stream.html#stream_event_data
* @param iterable Iterable or Readable stream in the string data reading mode.
* @returns Array of SaxesParser events
* @throws Error if a SaxesParser error event was emitted.
*/
async function *parseChunk(iterable: Iterable<string> | Readable): AsyncGenerator<SaxesEvent[], void, undefined> {
const saxesParser = new SaxesParser<{}>();
let error;
saxesParser.on('error', _error => {
error = _error;
});
// As a performance optimization, we gather all events instead of passing
// them one by one, which would cause each event to go through the event queue
let events: SaxesEvent[] = [];
saxesParser.on('opentag', tag => {
events.push({
type: 'opentag',
tag
});
});
saxesParser.on('text', text => {
events.push({
type: 'text',
text
});
});
saxesParser.on('closetag', tag => {
events.push({
type: 'closetag',
tag
});
});
for await (const chunk of iterable) {
saxesParser.write(chunk as string);
if (error) {
throw error;
}
yield events;
events = [];
}
yield [{
type: 'end'
}];
}
const eventEmitter = new Emittery();
eventEmitter.on('text', async (text) => {
console.log('Start');
await new Promise<void>(async (resolve) => {
await new Promise<void>((resolve1) => {
console.log('First Level Promise End');
resolve1();
});
console.log('Second Level Promise End');
resolve();
});
});
const readable = createReadStream('./some-file.xml');
// Enable string reading mode
readable.setEncoding('utf8');
// Read stream chunks
for await (const saxesEvents of parseChunk(iterable) ?? []) {
// Process batch of events
for (const saxesEvent of saxesEvents ?? []) {
// Emit ordered events and process them in the event handlers strictly one-by-one
// See https://github.com/sindresorhus/emittery#emitserialeventname-data
await eventEmitter.emitSerial(event.type, event.tag || event.text);
}
}
也看看這個解決方案的主要討論https://github.com/lddubeau/saxes/issues/32
對於將來遇到類似問題的任何人,以下是我嘗試如何最終使其工作的其他方法,即使它在某種程度上是一種解決方法。
我試圖在嘗試pause-stream和pass-stream之間管道暫停流,因為它們應該在暫停時緩沖。 出於某種原因,這再次根本沒有改變行為。
最后,我決定從根本上解決問題,而不是創建一個 ReadingStream 並將其通過管道傳輸到 sax 中,我使用逐行從 XML 批量讀取行並寫入 sax 解析器。 現在可以正確暫停此行讀取過程並最終幫助我實現所需的行為
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.