简体   繁体   中英

strstr vs regex in c

Let's say, for example, I have a list of user id's, access times, program names, and version numbers as a list of CSV strings, like this:

1,1342995305,Some Program,0.98
1,1342995315,Some Program,1.20
2,1342985305,Another Program,15.8.3
1,1342995443,Bob's favorite game,0.98
3,1238543846,Something else,
...

Assume this list is not a file, but is an in-memory list of strings.

Now let's say I want to find out how often a program has been accessed to certain programs, as listed by their version number. (eg "Some Program version 1.20" was accessed 193 times, "Some Program version 0.98" was accessed 876 times, and "Some Program 1.0.1" was accessed 1,932 times)

Would it be better to build a regular expression and then use regexec() to find the matches and pull out the version numbers, or strstr() to match the program name plus comma, and then just read the following part of the string as the version number? If it makes a difference, assume I am using GCC on Linux.

Is there a performance difference? Is one method "better" or "more proper" than the other? Does it matter at all?

使用strstr() - 使用正则表达式计算出现次数并不是一个好主意,因为你还是需要使用循环,所以我建议你做一个简单的循环来搜索子字符串的poistion并增加计数器和启动每场比赛后搜索位置。

strchr/memcmp is how most libc versions implemented strstr. Hardware-dependent implementations of strstr in glibc do better. Both SSE2 and SSE4.2 (x86) instruction sets can do way better than scanning byte-by-byte. If you want to see how, I posted a couple blog articles a while back --- SSE2 and strstr and SSE2 and BNDM search --- that you might find interesting.

strtok(),并将数据分解为更结构化的东西(如结构列表)。

I'd do neither: I'm betting it would be faster to use strchr() to find the commas, and strcmp() to check the program name.

As for performance, I expect string functions ( strtok / strstr / strchr / strpos / strcmp ...) to run all more or less at the same speed (ie really, really fast), and regex to run appreciably slower albeit still quite fast.

The real performance benefit would come from properly designing the search though: how many times it must run, is the number of programs fixed...?

For example, a single scan whereby you get ALL the frequency data for all the programs would be much slower than a single scan seeking for a given program. But properly designed, all subsequent queries for other programs would run way faster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM