简体   繁体   中英

Extracting unique syscall names from strace output (via regex?)

I have a file produced by strace which contains all the system calls. Now I want to get the name of all system calls. Therefore, say if I have mprotect listed 4 times, I only need to list it 1 time, that is I only need to list unique system calls.

One method that comes to mind is to use regular expressions using python or any other language that supports parsing regular expression to first see all system calls and then eliminate the duplicates. For that purpose, I was first trying to test my regular expression using the search feature of notepad++. I want to match anything like this, blah( . For that purpose I devised the following regular expression

[a-zA-Z_](

but notepad found nothing. What do you think is the correct regular expression for this?

Why do you think you need regular expressions for this? The output of strace is a sequence of lines, each starting with

<c_identifier>(

and C identifiers can't contain ( , so you can just take the part up to the ( to get the name of the system calls. In Python, this one-liner computes the set of distinct system calls:

syscalls = set(ln.split('(', 1)[0] for ln in strace_output)

(You can do this in one line of Awk as well, if you rather work in the shell than in Python.)

Notepad++ should have told you invalid regular expression . The latest version does.

In regular expressions, parentheses have special meaning, so you have to escape them:

[a-zA-Z_]\(

will find h( in blah( , since the part in the brackets isn't quantified (as @CharlesDuffy pointed out).

To match the entire blah( , use

[a-zA-Z_]+\(

它应该是[a-zA-Z_]+\\( 。这是因为圆括号用作元字符。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM