![](/img/trans.png)
[英]node.js download multiple files asyncronously with fs writestream
[英]Node.js fs cheerio read and write multiple files
我從此處改編了以下代碼,將它們與Node.js和Cheerio一起使用以讀取html文件並將大型源文件拆分為小塊。 該代碼對於單個文件運行良好。
現在,我需要讀取多個大型html文件,並將它們一個接一個地拆分,然后將生成的文件輸出到一個文件夾中。 我如何讀寫文件夾中的每個文件,然后將其拆分?
這是代碼:
var cheerio = require('cheerio'),
fs = require('fs');
fs.readFile('./sourceHtml2/testone.html', 'utf8', dataLoaded);
function dataLoaded(err, data) {
$ = cheerio.load(data);
$('#toplevel > div').each(function (i, elem) {
var id = $(elem).attr('id'),
filename = id + '.html',
content = $.html(elem);
fs.writeFile('./output2/' + filename, content, function (err) {
console.log('Written html to ' + filename);
});
});
}
這是我的示例源文件
<!DOCTYPE html SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Lorem Ipsum</title>
</head>
<body>
<div id="toplevel">
<div id="1-1">
<h1>HTML Ipsum Presents One</h1>
<p>
<strong>Pellentesque habitant morbi tristique</strong>senectus et netus et malesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae, ultricies eget, tempor sit amet, ante. Donec eu libero sit amet quam egestas semper.
<h2>Header Level 2</h2>
<ol>
<li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</li>
<li>Aliquam tincidunt mauris eu risus.</li>
</ol>
<h3>Header Level 3</h3>
<ul>
<li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</li>
<li>Aliquam tincidunt mauris eu risus.</li>
</ul>
</div>
<div id="1-2">
<h1>HTML Ipsum Presents Two</h1>
<p>
<strong>Pellentesque habitant morbi tristique</strong>senectus et netus et malesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae, ultricies eget, tempor sit amet, ante. Donec eu libero sit amet quam egestas semper.
<h2>Header Level 2</h2>
<ol>
<li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</li>
<li>Aliquam tincidunt mauris eu risus.</li>
</ol>
<blockquote>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus magna. Cras in mi at felis aliquet congue. Ut a est eget ligula molestie gravida. Curabitur massa. Donec eleifend, libero at sagittis mollis, tellus est malesuada tellus,
at luctus turpis elit sit amet quam. Vivamus pretium ornare est.</p>
</blockquote>
<h3>Header Level 3</h3>
<ul>
<li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</li>
<li>Aliquam tincidunt mauris eu risus.</li>
</ul>
</div>
<div id="1-3">
<h1>HTML Ipsum Presents Three</h1>
<p>
<strong>Pellentesque habitant morbi tristique</strong>senectus et netus et malesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae, ultricies eget, tempor sit amet, ante. Donec eu libero sit amet quam egestas semper.
<h2>Header Level 2</h2>
<ol>
<li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</li>
<li>Aliquam tincidunt mauris eu risus.</li>
</ol>
<blockquote>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus magna. Cras in mi at felis aliquet congue. Ut a est eget ligula molestie gravida. Curabitur massa. Donec eleifend, libero at sagittis mollis, tellus est malesuada tellus,
at luctus turpis elit sit amet quam. Vivamus pretium ornare est.</p>
</blockquote>
<h3>Header Level 3</h3>
<ul>
<li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</li>
<li>Aliquam tincidunt mauris eu risus.</li>
</ul>
</div>
</div>
</body>
</html>
對你的幫助表示感謝。
您需要將輸入目錄中的文件作為數組處理,並且還希望防止輸出文件夾中的文件名沖突。
下面提供的代碼為這兩個問題提供了解決方案。 從“輸入”子文件夾讀取HTML文件(.htm和.html),並將生成的文件寫入“輸出”子文件夾。
var cheerio = require('cheerio'),
fs = require('fs');
// process files found in the 'input' folder
fs.readdir('./input', 'utf8', findHtmlFiles);
function findHtmlFiles(err, files) {
if (files.length) {
files.forEach(function (fullFilename) {
var pattern = /\.[0-9a-z]{1,5}$/i;
var ext = (fullFilename).match(pattern);
// only process '.htm' and '.html' files
if (ext[0] == '.htm' || ext[0] == '.html') {
fs.readFile('./input/' + fullFilename, 'utf8', function (err, data) {
if (err)
throw err
else {
// add the file name to prevent collisions
// in the output folder
var fileData = {
file: fullFilename.slice(0, (ext[0].length * -1)),
data: data
};
dataLoaded(null, fileData);
}
});
}
});
}
}
function dataLoaded(err, fd) {
$ = cheerio.load(fd.data);
$('#toplevel > div').each(function (i, elem) {
var id = $(elem).attr('id'),
filename = fd.file + '_' + id + '.html',
content = $.html(elem);
fs.writeFile('./output/' + filename, content, function (err) {
console.log('Written html to ' + filename);
});
});
}
控制台輸出樣例:
Written html to testone_1-1.html
Written html to testone_1-2.html
Written html to testone_1-3.html
Written html to testtwo_1-1.html
Written html to testtwo_1-2.html
Written html to testtwo_1-3.html
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.