C比Java慢：為什么？

Question

我很快寫了一個C程序，提取了一組gzip壓縮文件的第i行 （包含大約500,000行）。 這是我的C程序：

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include <zlib.h>

/* compilation:
gcc  -o linesbyindex -Wall -O3 linesbyindex.c -lz
*/
#define MY_BUFFER_SIZE 10000000
static void extract(long int index,const char* filename)
   {
   char buffer[MY_BUFFER_SIZE];
   long int curr=1;
   gzFile in=gzopen (filename, "rb");
   if(in==NULL)
       {
       fprintf(stderr,"Cannot open \"%s\" %s.\n",filename,strerror(errno));
       exit(EXIT_FAILURE);              }
   while(gzread(in,buffer,MY_BUFFER_SIZE)!=-1 && curr<=index)
       {
       char* p=buffer;
       while(*p!=0)
           {
           if(curr==index)
               {
               fputc(*p,stdout);
               }
           if(*p=='\n')
               {
               ++curr;
               if(curr>index) break;
               }
           p++;
           }
       }
   gzclose(in);
   if(curr<index)
       {
       fprintf(stderr,"Not enough lines in %s (%ld)\n",filename,curr);
       }
   }

int main(int argc,char** argv)
   {
   int optind=2;
   char* p2;
   long int count=0;
   if(argc<3)
       {
       fprintf(stderr,"Usage: %s (count) files...\n",argv[0]);
       return EXIT_FAILURE;
       }
   count=strtol(argv[1],&p2,10);
   if(count<1 || *p2!=0)
       {
       fprintf(stderr,"bad number %s\n",argv[1]);
       return EXIT_SUCCESS;
       }
   while(optind< argc)
       {
       extract(count,argv[optind]);
       ++optind;
       }
   return EXIT_SUCCESS;
   }

作為測試，我在java中編寫了以下等效代碼：

import java.io.*;
import java.util.zip.GZIPInputStream;

public class GetLineByIndex{
   private int index;

   public GetLineByIndex(int count){
       this.index=count;
   }

   private String extract(File file) throws IOException
       {
       long curr=1;
       byte buffer[]=new byte[2048];
       StringBuilder line=null;
       InputStream in=null;
       if(file.getName().toLowerCase().endsWith(".gz")){
           in= (new GZIPInputStream(new FileInputStream(file)));
       }else{
           in= (new FileInputStream(file));
       }
             int nRead=0;
       while((nRead=in.read(buffer))!=-1)
           {
           int i=0;
           while(i<nRead)
               {
               if(buffer[i]=='\n')
                   {
                   ++curr;
                   if(curr>this.index) break;
                                     }
               else if(curr==this.index)
                   {
                   if(line==null) line=new StringBuilder(500);
                   line.append((char)buffer[i]);
                   }
               i++;
               }
           if(curr>this.index) break;
           }
       in.close();
       return (line==null?null:line.toString());
       }

   public static void main(String args[]) throws Exception{
       int optind=1;
       if(args.length<2){
           System.err.println("Usage: program (count) files...\n");
           return;
       }
       GetLineByIndex app=new GetLineByIndex(Integer.parseInt(args[0]));

       while(optind < args.length)
           {
           String line=app.extract(new File(args[optind]));
           if(line==null)
               {
               System.err.println("Not enough lines in "+args[optind]);
               }
           else
               {
               System.out.println(line);
               }
           ++optind;
           }
       return;
   }
}

碰巧java程序在同一台機器上獲取大於C程序（~2'15''）的索引要快得多（~1'45''）（我多次運行該測試）。

我該如何解釋這種差異？

Answer 1

Java版本比C版本更快的最可能的解釋是C版本不正確。

修復C版本后，我獲得了以下結果（與您聲稱Java比C更快的說法相矛盾）：

Java 1.7 -client: 65 milliseconds (after JVM warmed up)
Java 1.7 -server: 82 milliseconds (after JVM warmed up)
gcc -O3:          37 milliseconds

任務是從文件words.gz打印第200000行。 文件words.gz是由words.gz /usr/share/dict/words 。

...
static char buffer[MY_BUFFER_SIZE];
...
ssize_t len;
while((len=gzread(in,buffer,MY_BUFFER_SIZE)) > 0  &&  curr<=index)
    {
    char* p=buffer;
    char* endp=buffer+len;
    while(p < endp)
       {
...

Answer 2

因為fputc（）不是很快，你在輸出文件中添加了stuf char-by-char。

調用fputc_unlocked或者更確切地說要分隔你要添加的東西並調用fwrite（）應該更快。

Answer 3

那么你的程序正在做不同的事情。 我沒有描述你的程序，但從查看你的代碼我懷疑這個區別：

要構建該行，可以在Java中使用它：

if(curr==this.index)
{
    if(line==null) line=new StringBuilder(500);
    line.append((char)buffer[i]);
}

而這在C：

if(curr==index)
{
    fputc(*p,stdout);
}

即你一次打印一個角色到stdout。 默認情況下，這是buffere，但我懷疑它仍然比你在Java中使用的500字符緩沖區慢。

Answer 4

我對編譯器執行的優化沒有更深入的了解，但我想這就是你的程序之間的區別。 這樣的微觀標記非常非常非常難以正確和有意義。 這是Brian Goetz撰寫的一篇文章，詳細闡述了這篇文章： http ： //www.ibm.com/developerworks/java/library/j-jtp02225/index.html

Answer 5

非常大的緩沖區可能會更慢。 我建議你讓緩沖區大小相同。 即2或8 KB

C比Java慢：為什么？

問題描述

5 個解決方案

解決方案1
22 已采納

解決方案2
15 2012-01-26 12:25:07

解決方案3
12 2012-01-26 12:26:02

解決方案4
0 2012-01-26 12:27:22

解決方案5
0 2012-01-26 12:27:38

C比Java慢：為什么？

問題描述

5 個解決方案

解決方案1 22 已采納

解決方案2 15 2012-01-26 12:25:07

解決方案3 12 2012-01-26 12:26:02

解決方案4 0 2012-01-26 12:27:22

解決方案5 0 2012-01-26 12:27:38

解決方案1
22 已采納

解決方案2
15 2012-01-26 12:25:07

解決方案3
12 2012-01-26 12:26:02

解決方案4
0 2012-01-26 12:27:22

解決方案5
0 2012-01-26 12:27:38