简体   繁体   中英

git diff --name-only and SHA

I need to run git diff between two commits and the result should be files introduced with the corresponding SHA.

Currently I can do:

git diff --name-only [start-sha] [end-sha]

which gives:

myfolder/a.txt
code/snippet.c
test.txt

But how do I get the SHA for when each of those files where modified/added so the resulting output becomes:

myfolder/a.txt     2314d344
code/snippet.c     gfhr76kl
test.txt           jk5534bf

?

Figure I could just do:

$files = git diff --name-only [start-sha] [end-sha]
foreach ($f in $files) {
     $sha = git log -n1 --format=format:%H $f
     print $f $sha
}

so getting the last commit/SHA where the file $f was modified.

Not sure if there is an internal git command to do this, but you could always just loop over your files and grab the first commit:

for file in `git diff [start-sha] [end-sha] --name-only`; 
    do echo -n "$file "; git log --pretty=short [start-sha] [end-sha] -- $file | /bin/grep commit | cut -b8- | head -n 1;
done

Your question conceals a fundamental issue. Let's take a look at a typical commit graph fragment:

...--E--F--G--H--I--J  <-- master

You pick two commits, such as E and I , and run git diff --name-only (or git diff --name-status ) to compare them:

 $ git diff --name-only <hash-E> <hash-I> myfolder/a.txt code/snippet.c test.txt 

but then say:

... the result should be files introduced with the corresponding SHA.

The fact that these file names come out means that all three files are present in commits E and/or I , but if you had commit E in your work-tree and wanted to modify it to get commit I , those three files would need some kind of change: create, modify, or even delete. Using --name-status will give you the kind of change as well: A for "the file is to be newly created", M for "the file is to be modified", and D for "the file is to be deleted". (There are, of course, probably dozens or thousands of files in both E and I that are the same and hence are not printed here.)

But now you ask for the corresponding hash where this change is introduced. There may not be a the hash. There might be more than one . (There must be at least one of course). For instance, test.txt might be removed entirely in F by mistake (instead of being corrected), put back intact-but-wrong in H , and then corrected in I . Meanwhile code/snippet.c might be modified in both G and I .

Which commit hash(es) would you like for each file? The answer to that determines how to find them. (Of course, if there is only one such hash, the problem vanishes.)

xxfelixxx's answer gives a (slighlty faulty but easily fixed and improved) method for obtaining one commit—the first one git log prints. To fix the one bug and improve it a bit, replace the do sequence with:

do echo -n "$file "; git rev-list <starthash>..<endhash> -- "$file" | head -1

That is, we want to find one of the commit hashes printed by git rev-list when run only over the specified start/stop points and looking for changes to the one file. We need <starthash>..<endhash> to do the initial commit-limiting, and the -- $file to select only commits that add, modify, or remove that one path. Note that if there are spaces in the file name you will need to quote it (so I did), although then reading the output of git diff itself gets tricky too.

Using head -1 gets you the most recent commit that touched the file, eg, with our example where code/snippet.c in both G and I , you get the hash for commit I . This is because Git works backwards, from newer commits to older ones. If you want the first one, use tail -1 , and if you want all of them, you will need a fancier format. :-)

(There is another subtle difference here, between git log and git rev-list , involving merge commits, but that probably won't affect you.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM