[英](C++) Leetcode: why is my code so much slower than sample top solution? (547. Number of Provinces)
https://leetcode.com/problems/number-of-provinces/ https://leetcode.com/problems/number-of-provinces/
I was pretty excited when I solved this problem on my very first try, within only 20/30 minutes, though when I submitted my code, I ended up in 8.43 percentile.当我第一次尝试解决这个问题时,我非常兴奋,只用了 20/30 分钟,但是当我提交我的代码时,我最终达到了 8.43 个百分点。 I looked at how the fastest solutions approached the problem, and low and behold, the sample top solution is nearly identical to my code, yet it runs 3x faster.
我查看了最快的解决方案是如何解决问题的,并且看,示例顶级解决方案几乎与我的代码相同,但它的运行速度快了 3 倍。 I've been comparing the code and can't really point out a substantial enough difference.
我一直在比较代码,并不能真正指出足够大的差异。 Both should be equally fast... Can anyone explain the why?
两者都应该同样快......谁能解释为什么? If I'm not mistaken it's O(mn) performance in both cases.
如果我没记错的话,这两种情况下的性能都是 O(mn)。
The following is my code.以下是我的代码。 It's pretty self-explanatory, so not sure heavy commenting would do any good.
这是不言自明的,所以不确定大量评论是否有任何好处。
class Solution {
public:
int findCircleNum(vector<vector<int>>& isConnected) {
int components = 0;
vector<bool> visited (isConnected.size(), false);
// go through each row
for (int i = 0; i < isConnected.size(); i++) {
// explore only unvisited items
if (!visited[i]) {
queue<int> q;
q.push(i);
components++;
while (!q.empty()) {
int node = q.front();
q.pop();
visited[node] = true;
// push all direct connections onto the queue so we explore them
for (int j = 0; j < isConnected[0].size(); j++) {
if (isConnected[node][j] == 1 && !visited[j]) {
q.push(j);
}
}
}
}
}
return components;
}
};
and the following is a sample top solution that runs 3x faster than my code.以下是一个示例顶级解决方案,其运行速度比我的代码快 3 倍。
class Solution {
public:
int findCircleNum(vector<vector<int>>& M) {
if (M.empty()) {
return 0;
}
int count = 0;
vector<bool> visited(M.size());
auto bfs = [&](int student) {
queue<int> q;
q.push(student);
visited[student] = true;
while (!q.empty()) {
auto current = q.front();
cout << "current " << current << endl;
q.pop();
for (int i = 0; i < M.size(); i++) {
if (M[current][i] == 1 and !visited[i]) {
visited[i] = true;
q.push(i);
}
}
}
};
for (int r = 0; r < M.size(); r++) {
if (visited[r] == false) {
count++;
bfs(r);
}
}
return count;
}
};
The difference is as far as I can [see][1] the placement of visited[i] = true;
区别在于我可以 [see][1] 的位置
visited[i] = true;
, which causes a few less memory access per iteration. ,这会导致每次迭代的 memory 访问次数减少。 Where the OP code needs to re-fetch the bool.
OP 代码需要重新获取布尔值的地方。
And there might be a data or control flow dependence between并且之间可能存在数据或控制流依赖
visited[node] = true;
and和
!visited[j]
That is not there in the Best code.最佳代码中没有。
OP code inner loop OP代码内循环
.L118:
mov rax, QWORD PTR [rsi+rbx]
cmp DWORD PTR [rax+rcx*4], 1
jne .L116
mov rax, rbp
mov r8, rcx
sal rax, cl
mov rcx, QWORD PTR [rsp+80]
shr r8, 6
and rax, QWORD PTR [rcx+r8*8]
jne .L116
mov rax, QWORD PTR [rsp+192]
sub rax, 4
cmp rdi, rax
je .L117
"Best" code “最佳”代码
.L76:
mov rax, QWORD PTR [rsi+rbx]
cmp DWORD PTR [rax+rcx*4], 1
jne .L74
mov rax, QWORD PTR [r12]
mov rsi, rcx
shr rsi, 6
mov rax, QWORD PTR [rax]
lea rsi, [rax+rsi*8]
mov eax, 1
sal rax, cl
mov rcx, QWORD PTR [rsi]
test rcx, rax
jne .L74
or rax, rcx <------------ visited[i] = true;
mov QWORD PTR [rsi], rax
mov rax, QWORD PTR [rsp+96]
sub rax, 4
cmp r8, rax
je .L75
mov DWORD PTR [r8], edx
add r8, 4
mov QWORD PTR [rsp+80], r8
jmp .L74
[1]: https://godbolt.org/z/obfqf7
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.