disclaimer, self-answered post to hopefully save others time.
Setup :
I've been using chrome's implementation of the file systems API, [1] [2] [3] .
This requires enabling the flag chrome://flags/#native-file-system-api .
For starters I want to recursively read a directory and obtain a list of files. This is simple enough:
paths = [];
let recursiveRead = async (path, handle) => {
let reads = [];
// window.handle = handle;
for await (let entry of await handle.getEntries()) { // <<< HANGING
if (entry.isFile)
paths.push(path.concat(entry.name));
else if (/* check some whitelist criteria to restrict which dirs are read*/)
reads.push(recursiveRead(path.concat(entry.name), entry));
}
await Promise.all(reads);
console.log('done', path, paths.length);
};
chooseFileSystemEntries({type: 'openDirectory'}).then(handle => {
recursiveRead([], handle).then(() => {
console.log('COMPLETELY DONE', paths.length);
});
});
I've also implemented a non-recursive while-loop-queue version. And lastly, I've implemented a node fs.readdir
version. All 3 solutions work fine for small directories.
The problem:
But then I tried running it on some sub-directories of the chromium source code ('base', 'components', and 'chrome'); together the 3 sub-dirs consist of ~63,000 files. While the node implementation worked fine (and surprisingly it utilized cached results between runs, resulting in instantaneous runs after the first), both browser implementations hung.
Attempted debugging:
Sometimes, they would return the full 63k files and print 'COMPLETLEY DONE'
as expected. But most often (90% of the time) they would read 10k-40k files before hanging.
I dug deeper into the hanging, and apparently the for await
line was hanging. So I added the line window.handle = handle
immediately before the for loop; when the function hung, I ran the for loop directly in the browser console, and it worked correctly. So now I'm stuck. I have seemingly working code that randomly hangs.
Solution:
I tried skipping over directories that would hang:
let whitelistDirs = {src: ['base', 'chrome', 'components', /*'ui'*/]}; // 63800
let readDirEntry = (handle, timeout = 500) => {
return new Promise(async (resolve, reject) => {
setTimeout(() => reject('timeout'), timeout);
let entries = [];
for await (const entry of await handle.getEntries())
entries.push(entry);
resolve(entries);
});
};
let readWhile = async entryHandle => {
let paths = [];
let pending = [{path: [], handle: entryHandle}];
while (pending.length) {
let {path, handle} = pending.pop();
await readDirEntry(handle)
.then(entries =>
entries.forEach(entry => {
if (entry.isFile)
paths.push({path: path.concat(entry.name), handle: entry});
else if (path.length || !whitelistDirs[handle.name] || whitelistDirs[handle.name].includes(entry.name))
pending.push({path: path.concat(entry.name), handle: entry});
}))
.catch(() => console.log('skipped', handle.name));
console.log('paths read:', paths.length, 'pending remaining:', pending.length, path);
}
console.log('read complete, paths.length');
return paths;
};
chooseFileSystemEntries({type: 'openDirectory'}).then(handle => {
readWhile(handle).then(() => {
console.log('COMPLETELY DONE', paths.length);
});
});
And the results showed a pattern. Once a directory read hung and was skipped, the subsequent ~10 dir reads would likewise hang and be skipped. Then the following reads would resume functioning properly until the next similar incident.
// begins skipping
paths read: 45232 pending remaining: 49 (3) ["chrome", "browser", "favicon"]
VM60:25 skipped extensions
VM60:26 paths read: 45239 pending remaining: 47 (3) ["chrome", "browser", "extensions"]
VM60:25 skipped enterprise_reporting
VM60:26 paths read: 45239 pending remaining: 46 (3) ["chrome", "browser", "enterprise_reporting"]
VM60:25 skipped engagement
VM60:26 paths read: 45266 pending remaining: 45 (3) ["chrome", "browser", "engagement"]
VM60:25 skipped drive
VM60:26 paths read: 45271 pending remaining: 44 (3) ["chrome", "browser", "drive"]
// begins working properly again
So the issue seemed temporal. I added a simple retry wrapper with a 500ms wait between retries, and the reads began working fine.
readDirEntryRetry = async (handle, timeout = 500, tries = 5, waitBetweenTries = 500) => {
while (tries--) {
try {
return await readWhile(handle, timeout);
} catch (e) {
console.log('readDirEntry failed, tries remaining:', tries, handle.name);
await sleep(waitBetweenTries);
if (!tries)
return e;
}
}
};
Conclusion:
The non-standard Native File System API hangs when reading large directories. Simply retrying after waiting resolves the issue. Took me a good week to arrive at this solution, so thought it'd be worth sharing.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.