简体   繁体   English

使用 lodash 或下划线或 vanilla js 查找 object 嵌套数组之间的深层差异

[英]Find deep difference between nested array of object with lodash or underscore or vanilla js

Objects I want to compare我要比较的对象

const prevObj = {
    arr: [
        {
            name: 'Ankur',
        },
        {
            email: 'ankur4736@gmail.com',
        },
        {
            key: 456,
        },
        anotherArr = [{
            name: 'Shubham',
            anotherNestedArr =[{
                name: 'Gundeep Paji'
            }]
        }]
    ],
}

const newObj = {
    arr: [
        {
            name: 'Shubham',
        },
        {
            email: 'shubham@gmail.com',
        },
        {
            key: 466,
        },
        anotherArr = [{
            name: 'Gundeep Paji',
            anotherNestedArr =[{
                name: 'Ankur'
            }]
        }]
    ],
}

function difference(oldValues, newValues) {
    function changes(oldValues, newValues) {
        return _.transform(newValues, function (result, value, key) {
            if (!_.isEqual(value, oldValues[key])) {
                if (_.isArray(value)) {
                    const oldArr = oldValues[key];
                    const newArr = value;
                    /* Needs solution for this case*/
                }
                /* This will execute is object is nested */
                if (_.isObject(value) && _.isObject(oldValues[key])) {
                    result[key] = changes(value, oldValues[key])
                }
                /* This will execute is object is not nested */
                else {
                    result[key] = {
                        oldValue: oldValues[key],
                        newValue: value
                    };
                }
            }
        });
    }

    return changes(oldValues, newValues);
}

I am creating an algorithm to compare deeply nested objects, I am using lodash to compare changes in object.我正在创建一个算法来比较深度嵌套的对象,我正在使用 lodash 来比较 object 中的变化。 In case of object my code is working but I want a solution if the data type is array.在 object 的情况下,我的代码正在运行,但如果数据类型是数组,我想要一个解决方案。 Also I need older value of the changed value.我还需要更改值的旧值。

Understanding the problem理解问题

The example objects in the question use an invalid notation and also assign a global variable anotherArr as a side effect, which was probably unintended.问题中的示例对象使用了无效的表示法,并且还分配了一个全局变量anotherArr作为副作用,这可能是无意的。 In this answer, I will assume the following notation instead:在这个答案中,我将采用以下表示法:

const prevObj = {
    arr: [{
        name: 'Ankur',
    }, {
        email: 'ankur4736@gmail.com',
    }, {
        key: 456,
    }, [{
        name: 'Shubham',
        anotherNestedArr: [{
            name: 'Gundeep Paji'
        }],
    }]],
};

const newObj = {
    arr: [{
        name: 'Shubham',
    }, {
        email: 'shubham@gmail.com',
    }, {
        key: 466,
    }, [{
        name: 'Gundeep Paji',
        anotherNestedArr: [{
            name: 'Ankur'
        }],
    }]],
};

If I'm understanding the question correctly, you not only want to perform a deep comparison between the objects, but also create a new object that contains a description of the differences.如果我正确理解了这个问题,您不仅要在对象之间进行深入比较,还要创建一个包含差异描述的新 object。 You state that your current code works for objects but not for arrays.您 state 您当前的代码适用于对象,但不适用于 arrays。 From your code, I'm gathering that you want to the diff to look like this:从您的代码中,我收集到您希望差异看起来像这样:

const diff = {
    arr: [{
        name: {
            oldValue: 'Ankur',
            newValue: 'Shubham',
        },
    }, {
        email: {
            oldValue: 'ankur4736@gmail.com',
            newValue: 'shubham@gmail.com',
        },
    }, {
        key: {
            oldValue: 456,
            newValue: 466,
        },
    }, [{
        name: {
            oldValue: 'Shubham',
            newValue: 'Gundeep Paji',
        },
        anotherNestedArr: [{
            name: {
                oldValue: 'Gundeep Paji',
                newValue: 'Ankur',
            },
        }],
    }]],
};

Sparse vs. dense arrays稀疏与密集 arrays

Before we commence, I should point out that there are two different ways one may create a new array containing only the values from the old array that have changed in the new array.在我们开始之前,我应该指出,有两种不同的方法可以创建一个新数组,其中只包含旧数组中在新数组中已更改的值。 For simplicity, I will use two arrays of numbers as an example:为简单起见,我将以两个数字 arrays 为例:

const oldArr = [1, 2, 3, 4, 5, 6, 7];
const newArr = [1, 1, 3, 4, 7, 6, 7];
// differences:    ^        ^

For the sake of illustration, assume for a moment that we only store the old values in the diff, instead of {oldValue, newValue} objects.为了便于说明,假设我们只将旧值存储在 diff 中,而不是{oldValue, newValue}对象。 Our diff array will start out empty either way, but if we choose the dense way of representing the diff, we simply append every element of oldArr that has changed in newArr to the end:无论哪种方式,我们的 diff 数组一开始都是空的,但如果我们选择表示 diff 的密集方式,我们只需 append 将 oldArr 的每个元素在oldArrnewArr到末尾:

const denseDiff = [2, 5];

The advantage of this representation is that denseDiff.length will tell us a straightforward truth, ie, two elements have changed.这种表示的优点是denseDiff.length将告诉我们一个直截了当的事实,即两个元素发生了变化。 The disadvantage is that we have no way to tell what was the original index of the changed elements.缺点是我们无法知道更改元素的原始索引是什么。

The sparse way is more complicated:稀疏的方式更复杂:

const sparseDiff = [, 2, , , 5];

sparseDiff.length , which is 5 , no longer tells us the number of changed elements. sparseDiff.length ,即5 ,不再告诉我们改变元素的数量。 Instead, it only conveys the highest index of a changed element.相反,它只传达已更改元素的最高索引。 In addition, if we take an index between 0 and sparseDiff.length that didn't change, for example sparseDiff[3] , it will always tell us undefined , of which the meaning is not clear.另外,如果我们取一个0sparseDiff.length之间没有变化的索引,例如sparseDiff[3] ,它总是会告诉我们undefined ,其含义并不明确。

There is a lot of room for confusion here, which generally isn't a good thing in programming.这里有很多混淆的空间,这在编程中通常不是一件好事。 However, the sparse way has the major advantage that the indices of changed elements are always equal between oldArr and sparseDiff .但是,稀疏方式的主要优点是更改元素的索引在oldArrsparseDiff之间始终相等。 That is, we maintain the following invariant for every index i :也就是说,我们为每个索引i维护以下不变量:

oldArr[i] == newArr[i] ? sparseDiff[i] == undefined : sparseDiff[i] == oldArr[i]

Fortunately, there is also a way for us to find directly which indices correspond to the changed elements.幸运的是,我们还有一种方法可以直接找到哪些索引对应于更改的元素。 _.keys (or Object.keys ) will tell us the indices of just the elements that were set explicitly (because they changed): _.keys (或Object.keys )将告诉我们仅显式设置的元素的索引(因为它们已更改):

_.keys(sparseDiff)  // ['1', '4']

_.keys(sparseDiff).length will even tell us the number of elements that changed. _.keys(sparseDiff).length甚至会告诉我们改变的元素数量。 We can get all the information we could obtain with the dense representation, just in a more roundabout and confusing way.我们可以通过密集表示获得所有我们可以获得的信息,只是以一种更加迂回和混乱的方式。

Concluding, sparse arrays are arguably the correct approach here, because they allow us to obtain all the information we might need.最后,稀疏的 arrays 可以说是正确的方法,因为它们允许我们获得我们可能需要的所有信息。 I pointed out the disadvantages of sparse arrays anyway, to hightlight that there is still a tradeoff.我指出了稀疏 arrays 的缺点,强调仍然存在权衡。

Your code你的代码

Before jumping to solutions, let's take this opportunity to learn as much as possible.在跳到解决方案之前,让我们借此机会尽可能多地学习。 There are a couple of things in your code that stood out to me.您的代码中有几件事对我来说很突出。

Double outer function双外function

This is the first thing I noticed:这是我注意到的第一件事:

function difference(oldValues, newValues) {
    function changes(oldValues, newValues) {
        //...
    }

    return changes(oldValues, newValues);
}

The difference function is simply forwarding its arguments in the same order to changes and then echoing its return value. function 的difference只是将其 arguments 以相同的顺序转发到changes ,然后回显其返回值。 That means that difference isn't really doing anything.这意味着difference并没有真正做任何事情。 You could remove the outermost difference function, call changes instead of difference and still get the same result:您可以删除最外层的difference function,调用changes而不是difference ,仍然得到相同的结果:

function changes(oldValues, newValues) {
    //...
}

_.transform _。转换

The second thing I noticed is that you're using _.transform .我注意到的第二件事是您正在使用_.transform If we look at how this function was conceived, it turns out to be a variant of _.reduce that can stop iteration early by returning false from the iteratee.如果我们看一下这个 function 是如何构思的,它原来是_.reduce的一个变体,可以通过从被迭代者返回false来提前停止迭代。 Since the iteratee must be able to return false but we must also be able to obtain a result at the end, this means that the iteratee must modify the accumulator in place, instead of returning it as we would do in _.reduce .由于 iteratee 必须能够返回false ,但我们也必须能够在最后获得结果,这意味着 iteratee 必须就地修改累加器,而不是像我们在_.reduce中那样返回它。 There are two major problems with this.这有两个主要问题。

Firstly, modifying in place doesn't work with immutable values such as strings.首先,就地修改不适用于字符串等不可变值。 For this reason, _transform only works on objects and arrays.因此, _transform仅适用于对象和 arrays。 Consider the following function to reverse a string, which can work with _.reduce but not with _.transform :考虑以下 function 来反转字符串,它可以与_.reduce一起使用,但不能与_.transform一起使用:

function reverse(string) {
    return _.reduce(string, (result, char) => char + result);
}

You could work around this by wrapping the result in an object and then unwrapping it afterwards, but this means you're working around the limitations of _.transform while you should be using _.reduce in the first place.可以通过将result包装在 object 中然后解包来解决此问题,但这意味着您正在解决_.transform的限制,而您应该首先使用_.reduce

Secondly and more importantly, partially transforming an object makes no sense .其次,更重要的是,对 object 进行部分改造是没有意义的 The keys of an object are unordered. object 的键是无序的。 Suppose that during iteration, you find the stop condition on a particular key.假设在迭代期间,您在特定键上找到停止条件。 By pure coincidence, some of the other keys have already been visited while others haven't yet.纯属巧合,其他一些键已经被访问过,而其他键还没有。 There is no defensible reason why the keys that have already been visited should be part of your transformed result and the other keys shouldn't, since this division is completely arbitrary.已经访问过的键应该是转换结果的一部分,而其他键不应该是没有理由的,因为这种划分是完全任意的。

_.transform is useless for immutable values and meaningless for unordered collections. _.transform对于不可变的值是没有用的,对于无序的 collections 是没有意义的。 How about ordered collections, arrays?订购 collections、arrays 怎么样? At first sight, there are some reasonable scenarios where you might want to partially transform an array.乍一看,在一些合理的场景中,您可能希望对数组进行部分转换。 For example, to compute the least number above a threshold that can be obtained by summing numbers in an array from the start:例如,要计算高于阈值的最小数字,该阈值可以通过从一开始就对数组中的数字求和来获得:

function thresholdPartialSum(numbers, threshold) {
    return _.transform(numbers, function(result, number) {
        result.wrapped += number;
        return result.wrapped <= threshold;
    }, { wrapped: 0 }).wrapped;
}

Note the wrapped appearing in four places.注意wrapped出现在四个地方。 We are working around _.transform 's limitation of having to modify the accumulator in place here, too.我们正在解决_.transform的限制,即必须在此处修改累加器。

Admittedly, _.transform has a performance advantage over _.reduce in this case.诚然,在这种情况下, _.reduce .transform 比_.transform具有性能优势。 However, we need neither .但是,我们都不需要 . Partially iterating arrays is already handled perfectly well by good old _.find and similar functions.部分迭代 arrays 已经被良好的旧_.find和类似函数处理得很好。 Moreover, since we have to modify the accumulator in place anyway, we might as well close over it.此外,由于无论如何我们都必须修改累加器,我们不妨关闭它。 _.find will handle scenarios like these just as well as _.transform , with less complexity: _.find将与_.transform一样处理此类场景,但复杂性较低:

function thresholdPartialSum(numbers, threshold) {
    let partialSum = 0;
    find(numbers, function(number) {
        partialSum += number;
        return partialSum > threshold;
    });
    return partialSum;
}

In conclusion, you should never use _.transform .总之,你不应该使用_.transform Use _.reduce or _.find instead.请改用_.reduce_.find

Duplicate recursion重复递归

This is the third thing I noticed:这是我注意到的第三件事:

if (!_.isEqual(value, oldValues[key])) {
    // recursion here
}

While this will not lead to incorrect results, _.isEqual is itself recursive.虽然这不会导致错误的结果, _.isEqual本身就是递归的。 It bases its return value on whether any difference is found.它的返回值基于是否发现任何差异。 This means that when a difference is found, you end up recursively iterating both values again, just to find back the difference that _.isEqual already identified but didn't tell you the position of.这意味着当发现差异时,您最终会再次递归迭代这两个值,只是为了找回_.isEqual已经识别但没有告诉您 position 的差异。

At first glance, this may seem not so problematic.乍一看,这似乎不是那么成问题。 You might recurse twice instead of once.您可能会递归两次而不是一次。 Not the most efficient possible, but at most twice as slow, right?不是最有效的,但最多慢两倍,对吧? Unfortunately, the duplication doesn't stop here.不幸的是,重复并不止于此。 Once you recursively call your own changes function, _.isEqual will be invoked again for each subelement of the element you recursed into, and it will be recursive again .一旦您递归调用自己的changes function,_. _.isEqual将再次为您递归进入的元素的每个子元素调用,并且它将再次递归。

To quantify this, let's start by visualizing a nested object as a tree.为了量化这一点,让我们首先将嵌套的 object 可视化为一棵树。

const tree = {
    mainBranch1: {
        branch1_1: {
            twig1_1_1: {
                leaf1_1_1_1: 'value 1.1.1.1',
                leaf1_1_1_2: 'value 1.1.1.2',
            },
            twig1_1_2: {
                leaf1_1_2_1: 'value 1.1.2.1',
                leaf1_1_2_2: 'value 1.1.2.2',
            },
        },
        branch1_2: {
            twig1_2_1: {
                leaf1_2_1_1: 'value 1.2.1.1',
                leaf1_2_1_2: 'value 1.2.1.2',
            },
            twig1_2_2: {
                leaf1_2_2_1: 'value 1.2.2.1',
                leaf1_2_2_2: 'value 1.2.2.2',
            },
        },
    },
    mainBranch2: {
        branch2_1: {
            twig2_1_1: {
                leaf2_1_1_1: 'value 2.1.1.1',
                leaf2_1_1_2: 'value 2.1.1.2',
            },
            twig2_1_2: {
                leaf2_1_2_1: 'value 2.1.2.1',
                leaf2_1_2_2: 'value 2.1.2.2',
            },
        },
        branch2_2: {
            twig2_2_1: {
                leaf2_2_1_1: 'value 2.2.1.1',
                leaf2_2_1_2: 'value 2.2.1.2',
            },
            twig2_2_2: {
                leaf2_2_2_1: 'value 2.2.2.1',
                leaf2_2_2_2: 'value 2.2.2.2',
            },
        },
    },
};

We have a tree with four levels of branching (not including tree itself).我们有一棵具有四个分支级别的树(不包括tree本身)。 There are sixteen leafs.有十六片叶子。

Recursion will stop at the leaf keys, so let's count how many times _.isEqual will visit a leaf key.递归将在leaf键处停止,所以让我们计算_.isEqual将访问叶键的次数。 Assume that we will be comparing tree to another object with a similar structure and about half of the leaf values has changed.假设我们将tree与另一个结构相似的 object 进行比较,并且大约一半的叶值发生了变化。

First, we call changes on tree and the other object.首先,我们调用tree和其他 object 的changes We invoke _.isEqual once for each mainBranch .我们为每个mainBranch调用_.isEqual一次。 _.isEqual recursively calls itself until it reaches the leaf keys. _.isEqual递归调用自身,直到它到达叶键。 It needs to visit about half of the leafs in order to find a difference, at which point it knows it must return false .它需要访问大约一半的叶子才能找到差异,此时它知道它必须返回false At this point, we have visited each leaf half a time on average.此时,我们平均访问了每片叶子半次。

Every time _.isEqual returns false , we recursively call changes on the corresponding main branch.每次_.isEqual返回false时,我们递归地调用相应主分支上的changes Each call to changes then invokes _.isEqual once for each branch within.每次调用changes然后为其中的每个branch调用_.isEqual一次。 _.isEqual recursively calls itself again until it reaches the leaf keys. _.isEqual再次递归调用自身,直到它到达叶键。 Now we have visited each leaf once on average.现在我们平均访问了每片叶子一次。

_.isEqual again returns false every time, so we recurse into changes again. _.isEqual每次都再次返回false ,因此我们再次递归到changes中。 The whole pattern repeats and we're at one and a half visits per leaf.整个模式重复,我们每片叶子访问一次半。

We finally repeat the whole pattern one more time with the twig s and we end up at two invocations of _.isEqual per leaf key on average.最后,我们使用twig再次重复整个模式一次,最终平均每个叶子键调用两次_.isEqual That means 16 * 2 = 32 invocations of _.isEqual on a leaf key in total.这意味着在叶子键上总共调用了16 * 2 = 32 _.isEqual The total number of invocations per leaf is equal to the depth of the tree times the number of leafs, divided by two.每个叶子的调用总数等于树的深度乘以叶子的数量,再除以 2。 In complexity analysis, we call this log-linear or quasilinear time .在复杂性分析中,我们称之为对数线性时间或准线性时间

This is the worst case.这是最坏的情况。 With fewer than half of the leafs differing, changes will recurse less often, while with more than half of the leafs differing, _.isEqual will stop its own recursion sooner.如果只有不到一半的叶子不同,则changes的递归频率会降低,而超过一半的叶子不同,_. _.isEqual将更快地停止自己的递归。 Nevertheless, without the duplicate recursion, we would need to visit each leaf only once, regardless of the depth of the tree.然而,如果没有重复递归,我们将只需要访问每个叶子一次,而不管树的深度如何。 After all, when we determine whether a key has changed, we might as well directly record the difference.毕竟,当我们判断某个键是否发生了变化时,还不如直接记录差异。 With large, deeply nested structures this can be a big difference, especially if it happens in a hot loop.对于大型、深度嵌套的结构,这可能会有很大的不同,特别是如果它发生在热循环中。

Removing the duplication requires that we change the algorithm from a pre-order traversal to a post-order traversal .删除重复需要我们将算法从前序遍历更改为后序遍历 That is, instead of first determining whether there are any differences with a larger branch and then finding those differences within each of its smaller branches, we have to go straight to collecting all the differences within the smallest branches.也就是说,不是首先确定较大分支是否有任何差异,然后在每个较小的分支中找到这些差异,我们必须直接 go 收集最小分支中的所有差异。 If we find any differences within a smaller branch, we then know that we should also record the containing larger branch.如果我们在较小的分支中发现任何差异,那么我们就知道我们还应该记录包含较大的分支。

Solution解决方案

Enough theory.理论够了。 The following code addresses the issues I discussed above and works for objects as well as arrays.以下代码解决了我上面讨论的问题,适用于对象以及 arrays。 It works regardless of whether you're using Underscore or Lodash.无论您使用的是 Underscore 还是 Lodash,它都有效。

function changes(oldCollection, newCollection) {
    return _.reduce(oldCollection, function(result, oldValue, key) {
        const newValue = newCollection[key];
        if (_.isObject(oldValue) && _.isObject(newValue)) {
            const diff = changes(oldValue, newValue);
            if (!_.isEmpty(diff)) result[key] = diff;
        } else if (oldValue !== newValue) {
            result[key] = { oldValue, newValue };
        }
        return result;
    }, _.isArray(oldCollection) ? [] : {});
}

Final remarks最后的评论

As in the original question, the diff object returned by changes contains a nested {oldValue, newValue} object for each changed leaf key.与原始问题一样, changes返回的差异 object 包含每个更改的叶键的嵌套{oldValue, newValue} object。 This could lead to ambiguities if either the old or the new object contains keys with the name oldValue or newValue .如果旧的或新的 object 包含名称为oldValuenewValue的键,这可能会导致歧义。 This is not as crazy as it may seem;这并不像看起来那么疯狂。 consider comparing diffs at two different points in time.考虑比较两个不同时间点的差异。

To avoid such situations, you could opt to store only the old value (or only the new value) in the diff.为避免这种情况,您可以选择在 diff 中仅存储旧值(或仅新值)。 Since you'll know the exact location of the value that changed, it will be trivial to look up the corresponding key in the other object.由于您将知道更改值的确切位置,因此在另一个 object 中查找相应的键将是微不足道的。 You need to change only one line to make the algorithm behave this way, but this is left as an exercise to the reader.您只需要更改一行即可使算法以这种方式运行,但这留给读者作为练习。

Also as in the original question, the above solution does not take into account that the new collection might have new keys that didn't exist in the old collection.同样与原始问题一样,上述解决方案没有考虑到新集合可能具有旧集合中不存在的新键。 If you want to detect new keys and you can accept that deleted keys will be missed, simply iterate over newCollection instead of over oldCollection .如果您想检测新键并且您可以接受删除的键将丢失,只需迭代newCollection而不是oldCollection If you want to pick up missing keys on both sides, determine the common keys and the uncommon keys by the symmetric difference first.如果要拾取两边丢失的密钥,首先通过对称差确定常用密钥和不常用密钥。 Treat the common keys as above while recording the uncommon keys directly as differences.像上面一样对待常用键,而直接记录不常用键作为差异。 This is, again, left as an exercise to the reader.这再次作为练习留给读者。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM