简体   繁体   中英

Search for a pattern using a regular expression and libgit2

I have an application which spawns a process in order to find occurences of a particular regex within a specific commit in a git repository by running:

git grep -G pattern revision

This works just fine, but the problem is that I do this in a loop and this is extremely slow. I profiled the code on Linux and the call to __libc_fork alone takes 94% of the run-time.

Obviously, I'd like to avoid this unnecessary overhead. To do some other git operations, I'm already using libgit2 in my application, but I don't see a convenient way to perform a regular expression search like I can with git grep . I can imagine manually going through all the files associated with a commit and performing the search, but I was hoping for a more elegant solution, up to a few lines.

Am I missing a relevant libgit2 API? Does anyone know of a quick way to search for a pattern using libgit2 ?

EDIT Just to clarify: in my loop, the revision is fixed, but the pattern changes.

libgit2 does not have a git grep equivalent, since that's nowhere near a basic Git operation. It's very high level and the actual interesting work (efficient grep) has nothing to do with Git, so libgit2 would be a bad place to put that code.

Since the issue that you see is down to forking being more expensive than anything else, I see two ways to avoid that. One is to use git cat-file 's --batch option to feed it a list of objects to show, which you can get eg from ls-tree like

git ls-tree -r ${revision} | cut -f 1 | cut -d ' ' -f 3 | git cat-file --batch

which produces a machine-readable output with an $id $type $len triplet at the start of each file (it might be easier/cheaper to replace those cut s with your own code that extracts the ids from stream coming from ls-tree ). Or you can use libgit2 to go through the tree and grab all the blobs from the trees recursively, which would end up getting you the same information in a slightly different way.

Then you can use some form of grep to run over these buffers. Your favourite programming language probably has an implementation of pcre or bindings to that library which you can feed these files.

You should be able to feed them one at a time regardless of which extraction method you choose by only reading from cat-file one at a time through those triplets that precede each object.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM