F＃与C＃性能签名与示例代码

Question

There are many discussions on this topic already, but I am all about flogging dead horses, particularly when I discover they may still be breathing. 关于这个话题已经有很多讨论，但我都是关于鞭打死马，特别是当我发现他们可能还在呼吸时。

I was working on parsing the unusual and exotic file format that is the CSV, and for fun I decided to characterize the performance against the 2 .net languages I know, C# and F#. 我正在研究解析CSV的异常和异国情调的文件格式，为了好玩，我决定用我知道的2 .net语言C＃和F＃来表征性能。

The results were...unsettling. 结果......令人不安。 F# won, by a wide margin, a factor of 2 or more(and I actually think it's more like .5n, but getting real benchmarks is proving to be tough since I am testing against hardware IO). F＃大幅提升了2倍或更多（实际上我认为它更像是.5n，但是由于我正在测试硬件IO，因此获得真正的基准测试很难）。

Divergent performance characteristics in something as common as reading a CSV is surprising to me(note that the coefficient means that C# wins on very small files. The more testing I am doing the more it feels like C# scales worse, which is both surprising and concerning, since it probably means I am doing it wrong). 像读取CSV这样常见的性能特征让我感到惊讶（请注意，系数意味着C＃在非常小的文件上获胜。我正在进行的测试越多，感觉C＃的表现越差，这既令人惊讶也有关系，因为它可能意味着我做错了）。

Some notes : Core 2 duo laptop, spindle disk 80 gigs, 3 gigs ddr 800 memory, windows 7 64 bit premium, .Net 4, no power options turned on. 一些笔记：Core 2 duo笔记本电脑，主轴磁盘80演出，3演出ddr 800内存，Windows 7 64位溢价，.Net 4，没有打开电源选项。

30,000 lines 5 wide 1 phrase 10 chars or less is giving me a factor of 3 in favor of the tail call recursion after the first run(it appears to cache the file) 第一次运行后30,000行5宽1短语10个字符或更少给我3个支持尾调用递归（它似乎缓存文件）

300,000(same data repeated) is a factor of 2 for the tail call recursion with F#'s mutable implementation winning out slightly, but the performance signatures suggest that I am hitting the disk and not ram-disking the whole file, which causes semi-random performance spikes. 对于尾部调用递归，300,000（重复相同的数据）是2的因子，F＃的可变实现略微胜出，但性能签名表明我正在击中磁盘而不是ram-disking整个文件，这会导致半随机性能尖峰。

F# code F＃代码

//Module used to import data from an arbitrary CSV source
module CSVImport
open System.IO

//imports the data froma path into a list of strings and an associated value
let ImportData (path:string) : List<string []> = 

    //recursively rips through the file grabbing a line and adding it to the 
    let rec readline (reader:StreamReader) (lines:List<string []>) : List<string []> =
        let line = reader.ReadLine()
        match line with
        | null -> lines
        | _ -> readline reader  (line.Split(',')::lines)

    //grab a file and open it, then return the parsed data
    use chaosfile = new StreamReader(path)
    readline chaosfile []

//a recreation of the above function using a while loop
let ImportDataWhile (path:string) : list<string []> =
    use chaosfile = new StreamReader(path)
    //values ina loop construct must be mutable
    let mutable retval = []
    //loop
    while chaosfile.EndOfStream <> true do
        retval <- chaosfile.ReadLine().Split(',')::retval 
    //return retval by just declaring it
    retval

let CSVlines (path:string) : string seq= 
    seq { use streamreader = new StreamReader(path)
          while not streamreader.EndOfStream do
            yield streamreader.ReadLine() }

let ImportDataSeq (path:string) : string [] list =
    let mutable retval = []
    let sequencer = CSVlines path
    for line in sequencer do
        retval <- line.Split()::retval
    retval

C# Code C＃代码

using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
using System.Text;

namespace CSVparse
{
    public class CSVprocess
    {
        public static List<string[]> ImportDataC(string path)
        {
            List<string[]> retval = new List<string[]>();
            using(StreamReader readfile = new StreamReader(path))
            {
                string line = readfile.ReadLine();
                while (line != null)
                {
                    retval.Add(line.Split());
                    line = readfile.ReadLine();
                }
            } 

           return retval;
        }

        public static List<string[]> ImportDataReadLines(string path)
        {
            List<string[]> retval = new List<string[]>();
            IEnumerable<string> toparse = File.ReadLines(path);

            foreach (string split in toparse)
            {
                retval.Add(split.Split());
            }
            return retval;
        }
    }

}

Note the variety of implementations there. 请注意那里的各种实现。 Using iterators, using sequences, using tail call optimizatons, while loops in 2 languages... 使用迭代器，使用序列，使用尾调用优化，while循环使用2种语言...

A major issue is that I am hitting the disk, and so some idiosyncracies can be accounted for by that, I intend on rewriting this code to read from a memory stream(which should be more consistent assuming I don't start to swap) 一个主要问题是我正在访问磁盘，因此可以解释一些特性，我打算重写此代码以从内存流中读取（假设我没有开始交换，这应该更加一致）

But everything I am taught/read says that while loops/for loops are faster than tail call optimizations/recursion, and every actual benchmark that I run is saying the dead opposite of that. 但是我所教授/阅读的所有内容都表明，while循环/ for循环比尾部调用优化/递归更快，而我运行的每个实际基准都说死了。

So I guess my question is, should I question the conventional wisdom? 所以我想我的问题是，我应该质疑传统智慧吗？

Is tail call recursion really better than while loops in the .net ecosystem? 尾调用递归真的比.net生态系统中的循环更好吗？

How does this work out on Mono? 这对Mono有什么影响？

Answer 1

I think that the difference may arise from different List s in F# and C#. 我认为差异可能来自F＃和C＃中的不同List 。 F# uses singly linked lists (see http://msdn.microsoft.com/en-us/library/dd233224.aspx ) whereas in C# System.Collections.Generic.List ist used, which is based on arrays. F＃使用单链表（请参阅http://msdn.microsoft.com/en-us/library/dd233224.aspx ），而在C＃ System.Collections.Generic.List使用，它基于数组。

Concatenation is much faster for singly linked lists, especially when you're parsing big files (you need to allocate/copy the whole array list from time to time). 对于单链接列表，连接速度要快得多，尤其是在解析大文件时（需要不时地分配/复制整个数组列表）。

Try using a LinkedList in the C# code, I'm curious about the results :) ... 尝试在C＃代码中使用LinkedList ，我对结果很好奇:) ...

PS: Also, this would be a good example on when to use a profiler. PS：此外，这将是何时使用分析器的一个很好的例子。 You could easily find the "hot spot" of the C# code... 您可以轻松找到C＃代码的“热点”......

EDIT 编辑

So, I tried this for myself: I used two identical files in order to prevent caching effects. 所以，我为自己尝试了这个：我使用了两个相同的文件来防止缓存效果。 The files were 3.000.000 lines with 10 times 'abcdef', separated by comma. 这些文件是3.000.000行，10次'abcdef'，用逗号分隔。

The main program looks like this: 主程序如下所示：

static void Main(string[] args) {
   var dt = DateTime.Now;
   CSVprocess.ImportDataC("test.csv"); // C# implementation
   System.Console.WriteLine("Time {0}", DateTime.Now - dt);
   dt = DateTime.Now;
   CSVImport.ImportData("test1.csv"); // F# implementation
   System.Console.WriteLine("Time {0}", DateTime.Now - dt);
}

(I also tried it with first executing the F# implementation and then the C#...) （我也尝试过首先执行F＃实现然后执行C＃...）

The result is: 结果是：

C#: 3.7 seconds C＃：3.7秒
F#: 7.6 seconds F＃：7.6秒

Running the C# solution after the F# solution gives the same performance for the F# version but 4.7 seconds for C# (I assume due to heavy memory allocation by the F# solution). 在F＃解决方案之后运行C＃解决方案为F＃版本提供相同的性能，但对C＃提供4.7秒（我假设由于F＃解决方案分配了大量内存）。 Running each solution alone doesn't change the above results. 单独运行每个解决方案不会改变上述结果。

Using a file with 6.000.000 lines gives ~ 7 seconds for the C# solution, the F# solution produces an OutOfMemoryException (I'm running this on a maching with 12GB Ram ...) 使用具有6.000.000行的文件为C＃解决方案提供约7秒的时间，F＃解决方案产生OutOfMemoryException（我在12GB Ram的机器上运行它...）

So for me it seems that the conventional 'wisdom' is true and C# using a simple loop is faster for this kind of tasks ... 所以对我而言，传统的“智慧”似乎是正确的，而使用简单循环的C＃对于这类任务来说更快......

Answer 2

You really, really , really , really shouldn't be reading anything into these results - either benchmark your entire system as a form of system test, or remove the disk I/O from the benchmark. 你真的，真的，真的，真的不应该在这些结果中读取任何东西 - 要么将整个系统作为系统测试的一种形式进行基准测试，要么从基准测试中删除磁盘I / O. It's just going to confuse matters. 这只会让事情变得混乱。 It's probably better practice to take a TextReader parameter rather than a physical path to avoid chaining the implementation to physical files. 采用TextReader参数而不是物理路径可能是更好的做法，以避免将实现链接到物理文件。

Additionally, as a microbenchmark your test has a few other flaws: 此外，作为微基准测试，您的测试还有一些其他缺陷：

You define numerous functions that aren't called during the benchmark. 您定义了许多在基准测试期间未调用的函数。 Are you testing ImportDataC or ImportDataReadLines ? 您在测试ImportDataC或ImportDataReadLines吗？ Pick and choose for clarity - and in real applications, don't duplicate implementations, but factor out similarities and define one in terms of the other. 选择是为了清晰 - 在实际应用中，不要重复实现，而是要考虑相似性并根据另一个来定义一个。
You're calling .Split(',') in F# but .Split() in C# - do you intend to split on comma's or on whitespaces? 你在F＃中调用.Split(',')在C＃中.Split() - 你打算在逗号或空格上分割吗？
You're reinventing the wheel - at least compare your implementation with the much shorter versions using higher-order functions (aka LINQ). 您正在重新发明轮子 - 至少将您的实现与使用高阶函数（又称LINQ）的更短版本进行比较。

Answer 3

I note that it looks like your F# is using F# list whereas C# is using .Net List. 我注意到，看起来你的F＃正在使用F＃列表，而C＃正在使用.Net List。 Might try changing F# to use other list type for more data. 可能会尝试更改F＃以使用其他列表类型来获取更多数据。

F＃与C＃性能签名与示例代码

问题描述

3 个解决方案

解决方案1
5 已采纳 2011-02-02 09:18:15

解决方案2
5 2011-02-03 10:56:30

解决方案3
2 2011-02-02 09:23:14

F＃与C＃性能签名与示例代码

问题描述

3 个解决方案

解决方案1 5 已采纳 2011-02-02 09:18:15

解决方案2 5 2011-02-03 10:56:30

解决方案3 2 2011-02-02 09:23:14

解决方案1
5 已采纳 2011-02-02 09:18:15

解决方案2
5 2011-02-03 10:56:30

解决方案3
2 2011-02-02 09:23:14