簡體   English   中英

C ++程序字符串搜索可以和python一樣快和/或更快嗎?

[英]Can C++ program string search as fast as and/or faster than python?

我不確定為什么我用python編寫的程序比用C ++編寫的程序更容易進行時間字符串搜索。 我有想念的招嗎?

生成用例

這是針對單行用例的,但是在實際用例中,我關心多行。

#include "tchar.h"
#include "stdio.h"
#include "stdlib.h"
#include <string>
#include <sstream>
#include <iostream>
#include <fstream>
#include <ctime>

using namespace std;
void main(void){
   ofstream testfile;
   unsigned int line_idx = 0;
   testfile.open("testfile.txt");
   for(line_idx = 0; line_idx < 50000u; line_idx++)
   {
      if(line_idx != 43268u )
      {
        testfile << line_idx << " dontcare" << std::endl;
      }
      else
      {
        testfile << line_idx << " care" << std::endl;
      }
   }
   testfile.close();
}

正則表達式使用正則表達式^(\\d*)\\s(care)$

C ++程序需要13.954秒

#include "tchar.h"
#include "stdio.h"
#include "stdlib.h"
#include <string>
#include <sstream>
#include <iostream>
#include <fstream>
#include <ctime>
using namespace std;

void main(void){
   double duration;
   std::clock_t start;
   ifstream testfile("testfile.txt", ios_base::in);
   unsigned int line_idx = 0;
   bool found = false;
   string line;
   regex ptrn("^(\\d*)\\s(care)$");

   start = std::clock();   /* Debug time */
   while (getline(testfile, line)) 
   {
      std::smatch matches;
      if(regex_search(line, matches, ptrn))
      {
         found = true;
      }
   }
   testfile.close();
   duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC;
   std::cout << "Found? " << (found ? "yes" : "no") << std::endl;
   std::cout << " Total time: " <<  duration << std::endl;
}

Python程序需要0.02200秒

import sys, os       # to navigate and open files
import re            # to search file
import time          # to benchmark

ptrn  = re.compile(r'^(\d*)\s(care)$', re.MULTILINE)

start = time.time()
with open('testfile.txt','r') as testfile:
   filetext = testfile.read()
   matches = re.findall(ptrn, filetext)
   print("Found? " + "Yes" if len(matches) == 1 else "No")

end = time.time()
print("Total time", end - start)

實施了拉塔建議的8.923

通過將文件讀取為單個字符串,大約可縮短5秒

   double duration;
   std::clock_t start;
   ifstream testfile("testfile.txt", ios_base::in);
   unsigned int line_idx = 0;
   bool found = false;
   string line;
   regex ptrn("^(\\d*)\\s(care)$");
   std::smatch matches;

   start = std::clock();   /* Debug time */
   std::string test_str((std::istreambuf_iterator<char>(testfile)),
                 std::istreambuf_iterator<char>());

   if(regex_search(test_str, matches, ptrn))
   {
      found = true;
   }
   testfile.close();
   duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC;
   std::cout << "Found? " << (found ? "yes" : "no") << std::endl;
   std::cout << " Total time: " <<  duration << std::endl;

UKMonkey發出注釋后,重新配置項目以發布其中還包含\\ O2並將其降低到0.086秒

感謝英國Ratah的Jean-Francois Fabre

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM