NOTE: This is a solution for Project Euler Problem 14 . If you still want to solve it yourself then don't read on.
The problem was to find the number below one million which, as starting number for the Collatz sequence , produces the longest such sequence. My initial code was as follows:
$r = @{}
for($i = 1; $i -lt 1000000; $i++) {
$n = 0
$a = $i
while ($a -gt 1) {
if ($r[$a]) {
$n += $r[$a]
break
}
if ($a % 2 -eq 0) {
$a /= 2
} else {
$a = $a * 3 + 1
}
$n++
}
$r[$i] = $n
}
$r.GetEnumerator() | sort Value | select -l 1 | %{$_.Key}
which tries to use a hashtable as cache for sub-sequences already encountered to save time. I was quite surprised when that script had a runtime of over eight minutes on my machine. Re-creating the same code in C#:
using System;
using System.Collections.Generic;
class Problem14
{
public static void Main()
{
var ht = new Dictionary<long, int>();
for (int i = 1; i < 1000000; i++)
{
int count = 0;
long j = i;
while (j > 1)
{
if (ht.ContainsKey(j))
{
count += ht[j];
break;
}
if (j % 2 == 0)
j /= 2;
else
j = 3 * j + 1;
count++;
}
ht[i] = count;
}
KeyValuePair<long, int> max = new KeyValuePair<long, int>();
foreach (var n in ht)
{
if (n.Value > max.Value)
max = n;
}
Console.WriteLine(max.Key);
}
}
had a runtime of just over one second. I knew that speed of execution wasn't a prime goal in Powershell. It's an adminstration language and for those tasks the ratio of PS code to cmdlets is likely very different from what I do here.
Still, I don't know what exactly causes the slowdown.
Suspecting the hash table, I replaced it for caching with an array. This resulted in an execution time around 200 ms in C# and about 32 minutes in Powershell. Code was as follows:
$r = ,0*1000000
for($i = 1; $i -lt 1000000; $i++) {
$n = 0
$a = $i
while ($a -gt 1) {
if ($r[$a]) {
$n += $r[$a]
break
}
if ($a % 2 -eq 0) {
$a /= 2
} else {
$a = $a * 3 + 1
}
$n++
}
if ($i -lt 1000000) {
$r[$i] = $n
}
}
$max = 0
for($i=1; $i -lt 1000000; $i++) {
if ($r[$i] > $r[$max]) {
$max = $i
}
}
$max
and
using System;
class Problem14
{
public static void Main()
{
var cache = new int[1000000];
for (int i = 1; i < 1000000; i++)
{
int count = 0;
long j = i;
while (j > 1)
{
if (j < 1000000 && cache[j] != 0)
{
count += cache[j];
break;
}
if (j % 2 == 0)
j /= 2;
else
j = 3 * j + 1;
count++;
}
cache[i] = count;
}
var max = 0;
for (int i = 1; i < cache.Length; i++)
{
if (cache[i] > cache[max])
max = i;
}
Console.WriteLine(max);
}
}
A completely cacheless variant was around 1.2 seconds in C#. Didn't try yet in Powershell.
Any ideas?
First and foremost, PowerShell is an interpreted language (not jitted, nor compiled). That's always going to hurt. ;-)
Secondly, there are some language constructs that you may use that can avoid multiple interpreted steps. For example, instead of using a for(;;) statement, use a range:
0..1000000 | foreach-object { foo $_
Might help a bit.
Most importantly, avoid the excessive use of break and continue keywords in loops - invert your logic if possible to avoid this. Internally, these keywords signal using .NET exceptions, and are therefore expensive operations.
Hope this helps your understanding somewhat.
EDIT: these keywords use .NET exceptions for signalling
From System.Management.Automation.FlowControlNode.Execute(...):
switch (this._tokenId)
{
case TokenId.ExitToken:
{
int exitCode = this.GetExitCode(result);
throw new ExitException(base.NodeToken, exitCode);
}
case TokenId.ReturnToken:
throw new ReturnException(base.NodeToken, result);
case TokenId.BreakToken:
label = this.GetLabel(result, context);
throw new BreakException(base.NodeToken, label);
case TokenId.ContinueToken:
label = this.GetLabel(result, context);
throw new ContinueException(base.NodeToken, label);
-Oisin
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.