(C++) Leetcode：为什么我的代码比示例顶级解决方案慢这么多？ (547. 省数)

Question

https://leetcode.com/problems/number-of-provinces/ https://leetcode.com/problems/number-of-provinces/

I was pretty excited when I solved this problem on my very first try, within only 20/30 minutes, though when I submitted my code, I ended up in 8.43 percentile.当我第一次尝试解决这个问题时，我非常兴奋，只用了 20/30 分钟，但是当我提交我的代码时，我最终达到了 8.43 个百分点。 I looked at how the fastest solutions approached the problem, and low and behold, the sample top solution is nearly identical to my code, yet it runs 3x faster.我查看了最快的解决方案是如何解决问题的，并且看，示例顶级解决方案几乎与我的代码相同，但它的运行速度快了 3 倍。 I've been comparing the code and can't really point out a substantial enough difference.我一直在比较代码，并不能真正指出足够大的差异。 Both should be equally fast... Can anyone explain the why?两者都应该同样快......谁能解释为什么？ If I'm not mistaken it's O(mn) performance in both cases.如果我没记错的话，这两种情况下的性能都是 O(mn)。

The following is my code.以下是我的代码。 It's pretty self-explanatory, so not sure heavy commenting would do any good.这是不言自明的，所以不确定大量评论是否有任何好处。

class Solution {
public:
    int findCircleNum(vector<vector<int>>& isConnected) {
        int components = 0;
        vector<bool> visited (isConnected.size(), false);
        
        // go through each row
        for (int i = 0; i < isConnected.size(); i++) {
            // explore only unvisited items
            if (!visited[i]) {
                queue<int> q;
                
                q.push(i);
                components++;
            
                while (!q.empty()) {
                    int node = q.front();
                    q.pop();
                    visited[node] = true;
                    
                    // push all direct connections onto the queue so we explore them
                    for (int j = 0; j < isConnected[0].size(); j++) {
                        if (isConnected[node][j] == 1 && !visited[j]) {
                            q.push(j);
                        }
                    }
                }
            }
        }
        
        return components;
    }
};

and the following is a sample top solution that runs 3x faster than my code.以下是一个示例顶级解决方案，其运行速度比我的代码快 3 倍。

class Solution {
public:
    int findCircleNum(vector<vector<int>>& M) {
        if (M.empty()) {
            return 0;
        }
        int count = 0;
        vector<bool> visited(M.size());
        auto bfs = [&](int student) {
            queue<int> q;
            q.push(student);
            visited[student] = true;
                    
            while (!q.empty()) {
                auto current = q.front();
                cout << "current " << current << endl;
                q.pop();
                        
                for (int i = 0; i < M.size(); i++) {
                    if (M[current][i] == 1 and !visited[i]) {
                        visited[i] = true;
                        q.push(i);
                    }
                }
            }
        };
        for (int r = 0; r < M.size(); r++) {
                if (visited[r] == false) {
                    count++;
                    bfs(r);
                }
        }
        return count;
    }
};

Answer 1

The difference is as far as I can [see][1] the placement of visited[i] = true;区别在于我可以 [see][1] 的位置visited[i] = true; , which causes a few less memory access per iteration. ，这会导致每次迭代的 memory 访问次数减少。 Where the OP code needs to re-fetch the bool. OP 代码需要重新获取布尔值的地方。

And there might be a data or control flow dependence between并且之间可能存在数据或控制流依赖

visited[node] = true;

and和

!visited[j]

That is not there in the Best code.最佳代码中没有。

OP code inner loop OP代码内循环

.L118:
        mov     rax, QWORD PTR [rsi+rbx]
        cmp     DWORD PTR [rax+rcx*4], 1
        jne     .L116

        mov     rax, rbp
        mov     r8, rcx
        sal     rax, cl
        mov     rcx, QWORD PTR [rsp+80]
        shr     r8, 6
        and     rax, QWORD PTR [rcx+r8*8]
        jne     .L116
        mov     rax, QWORD PTR [rsp+192]
        sub     rax, 4
        cmp     rdi, rax
        je      .L117

"Best" code “最佳”代码

.L76:
        mov     rax, QWORD PTR [rsi+rbx]
        cmp     DWORD PTR [rax+rcx*4], 1
        jne     .L74

        mov     rax, QWORD PTR [r12]
        mov     rsi, rcx
        shr     rsi, 6
        mov     rax, QWORD PTR [rax]
        lea     rsi, [rax+rsi*8]
        mov     eax, 1
        sal     rax, cl
        mov     rcx, QWORD PTR [rsi]
        test    rcx, rax
        jne     .L74
        or      rax, rcx <------------ visited[i] = true;
        mov     QWORD PTR [rsi], rax
        mov     rax, QWORD PTR [rsp+96]
        sub     rax, 4
        cmp     r8, rax
        je      .L75
        mov     DWORD PTR [r8], edx
        add     r8, 4
        mov     QWORD PTR [rsp+80], r8
        jmp     .L74


  [1]: https://godbolt.org/z/obfqf7

(C++) Leetcode：为什么我的代码比示例顶级解决方案慢这么多？ (547. 省数)

问题描述

1 个解决方案

解决方案1
0 2021-01-02 02:48:59

(C++) Leetcode：为什么我的代码比示例顶级解决方案慢这么多？ (547. 省数)

问题描述

1 个解决方案

解决方案1 0 2021-01-02 02:48:59

解决方案1
0 2021-01-02 02:48:59