简体   繁体   English

如何提高 JS 中函数式代码的性能

[英]How to improve performance of functional code in JS

I am trying to learn functional programming and I cannot figure this one out.我正在尝试学习函数式编程,但我无法弄清楚这一点。 In my minimal working example, I have a list of dictionaries, each dictionary containing a filename and the size of the file.在我的最小工作示例中,我有一个字典列表,每个字典包含一个文件名和文件大小。 I want to create a new dictionary that groups the files by size (this is part of a larger algorithm that finds duplicated files).我想创建一个按大小对文件进行分组的新字典(这是查找重复文件的较大算法的一部分)。

Here is the "traditional" approach to this, mutating data:这是“传统”方法,变异数据:

const groupFilesBySize = (allFileData) => {
  const filesSortedBySize = {};
  for (const fileData of allFileData) {
    if (fileData.size in filesSortedBySize) {
      filesSortedBySize[fileData.size].push(fileData.file);
    } else {
      filesSortedBySize[fileData.size] = [fileData.file];
    }
  }

  return filesSortedBySize;
};

And here is my "best" attempt to do it in a functional way:这是我以功能方式进行的“最佳”尝试:

const groupFilesBySizeFunctional = (allFileData) =>
  allFileData.reduce(
    (filesSortedBySize, fileData) => ({
      ...filesSortedBySize,
      [fileData.size]: filesSortedBySize[fileData.size]
        ? [...filesSortedBySize[fileData.size], fileData.file]
        : [fileData.file]
    }),
    {}
  );

I have benchmarked them (reproducible example below) and the functional version is about 10000 times slower .我已经对它们进行了基准测试(下面的可重现示例)并且功能版本慢了大约10000 倍 This is no joke --it is just plain unusable.这不是开玩笑——它只是完全无法使用。 I can imagine that creating a new dictionary every time we process a file in reduce is what is causing the delay.我可以想象每次我们在reduce处理文件时都会创建一个新字典是导致延迟的原因。

Nonetheless, now I see two possibilities: either functional programming has a terrible performance, or I cannot write proper functional code.尽管如此,现在我看到了两种可能性:要么函数式编程的性能很差,要么我无法编写正确的函数式代码。 For obviously the second one is right, I would like to ask: what is the proper way of writing the function groupFilesBySize in a functional way?因为显然第二个是对的,我想问:以函数方式编写函数groupFilesBySize的正确方法是什么?


Benchmark: Use this function to obtain the array of files paths and file sizes:基准:使用此函数获取文件路径和文件大小的数组:

async function walk(dir) {
  let files = [];
  files = await fs.readdir(dir);
  const parsedFiles = await Promise.all(files.map(async (fileName) => {
    const filePath = path.join(dir, fileName);

    const stats = await fs.lstat(filePath);
    if (stats.isSymbolicLink() || stats.size === 0) {
      return null;
    }
    if (stats.isDirectory()) {
      return walk(filePath);
    } else if (stats.isFile()) {
      return { file: filePath, size: stats.size };
    }
  }));

  return parsedFiles.reduce(
    (all, folderContents) => (folderContents ? all.concat(folderContents) : all),
    []
  );
}

Then benchmark everything using:然后使用以下方法对所有内容进行基准测试:

const benchMark = async () => {
  const dir = path.dirname(__filename);
  const allFileData = await walk(dir);
  console.log(`Total files: ${allFileData.length}`);

  let start = new Date();
  const result1 = groupFilesBySize(allFileData);
  const time1 = new Date() - start;

  start = new Date();
  const result2 = groupFilesBySizeFunctional(allFileData);
  const time2 = new Date() - start;

  console.log('\nFINAL REPORT:')
  console.log(`Are results equal? ${JSON.stringify(result1) === JSON.stringify(result2)}`);
  console.log(`Non functional approach: ${time1} ms`);
  console.log(`Functional approach: ${time2} ms`);
};

To have sizable data, I chose to install the node package eslint , so that I have to group all files in the node_modules folder: npm install eslint .为了获得可观的数据,我选择安装节点包eslint ,以便我必须将node_modules文件夹中的所有文件node_modulesnpm install eslint Output in my machine:在我的机器上输出:

Total files: 6229

FINAL REPORT:
Are results equal? true
Non functional approach: 6 ms
Functional approach: 34557 ms

If you want to use make use of the functional programming paradigm then make sure that you're using functional data structures, such as those provided by Immutable.js .如果您想使用函数式编程范例,请确保您使用的是函数式数据结构,例如由Immutable.js提供的数据结构。

 const { Map, List } = Immutable; const groupFilesBySize = allFileData => allFileData.reduce((filesSortedBySize, { size, file }) => filesSortedBySize.update(size, List(), list => list.push(file)), Map()); const allFileData = [ { size: 12, file: "Hello World!" }, { size: 3, file: "foo" }, { size: 3, file: "bar" }, { size: 6, file: "foobar" }, { size: 12, file: "Hello World!" }, { size: 4, file: "fizz" }, { size: 4, file: "buzz" }, { size: 8, file: "fizzbuzz" }, ]; console.time("groupFilesBySize"); for (let i = 0; i < 1e6; i++) groupFilesBySize(allFileData); console.timeEnd("groupFilesBySize"); console.log(groupFilesBySize(allFileData));
 <script src="https://cdnjs.cloudflare.com/ajax/libs/immutable/4.0.0-rc.12/immutable.min.js"></script>

On my machine it takes about 3 seconds to run one million iterations.在我的机器上,运行一百万次迭代大约需要 3 秒。 Compare that to your original solution.将其与您的原始解决方案进行比较。

 const groupFilesBySize = (allFileData) => { const filesSortedBySize = {}; for (const fileData of allFileData) { if (fileData.size in filesSortedBySize) { filesSortedBySize[fileData.size].push(fileData.file); } else { filesSortedBySize[fileData.size] = [fileData.file]; } } return filesSortedBySize; }; const allFileData = [ { size: 12, file: "Hello World!" }, { size: 3, file: "foo" }, { size: 3, file: "bar" }, { size: 6, file: "foobar" }, { size: 12, file: "Hello World!" }, { size: 4, file: "fizz" }, { size: 4, file: "buzz" }, { size: 8, file: "fizzbuzz" }, ]; console.time("groupFilesBySize"); for (let i = 0; i < 1e6; i++) groupFilesBySize(allFileData); console.timeEnd("groupFilesBySize"); console.log(groupFilesBySize(allFileData));

On my machine it takes about 400 milliseconds to run one million iterations.在我的机器上,运行一百万次迭代大约需要 400 毫秒。 Hence, the functional program is only about 10x slower than the imperative program.因此,函数式程序仅比命令式程序慢 10 倍左右。

In conclusion, don't use the functional programming paradigm with imperative data structures like objects and arrays.总之,不要将函数式编程范式与诸如对象和数组之类的命令式数据结构一起使用。 It's slow and it's messy.它很慢而且很乱。 Use functional data structures instead.改用函数式数据结构。

If you mutate inside the reduce there is no problem, and you will improve a little performance.如果你在reduce里面mutate就没有问题,你会提高一点性能。

const groupFilesBySizeFunctional = allFileData =>
  allFileData.reduce(
    (filesSortedBySize, fileData) =>
      Object.assign(filesSortedBySize, {
        [fileData.size]: filesSortedBySize[fileData.size]
          ? [...filesSortedBySize[fileData.size], fileData.file]
          : [fileData.file]
      }),
    {}
  );

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM