简体   繁体   中英

Why 'git rebase' keeps the rebased branch history?

As I understand from http://www.kernel.org/pub/software/scm/git/docs/git-rebase.html the rebased branch is 'moved' on to another one.

What I see during my tests, though, shows that the commits from the rebased branch remain in the history, so they are effectively duplicated.

重置前:

重新设定基准后:

Perhaps I'm missing something or don't understand the purpose of rebase completely or both.

If what I see is the intended behavior, then why is it so?

In short, rebase is a way to apply commits from one part of the tree to a different starting point. It may copy those changes, but will not move them.

Remember that git commits are immutable--once something has a hash, it never changes. That means that when you rebase some changes on top of another change, the hashes are necessarily different, so git will keep around both the old one and the new one.

However, if no branch name points to the old commit ("add file2" in your example) then after a couple of weeks git's automatic garbage collector will remove the old commit from your repository. (Why two weeks? That way, if you change your mind, you can retrieve the old commit from git reflog .) Generally, this is a good thing--it makes it harder to lose data by accident--but if the file is extremely huge you can use a combination of git prune and git gc to trim away the redundant data.

There are two separate phenomena here.

  1. The screenshot you posted, from gitk, shows the old commit still. That's just the way gitk works; If you reload by hitting Ctrl + F5 rather than just F5 (That's File > Reload rather than File > Update for you mouse users) you'll see the old commit disappears because it's no longer relevant.

  2. There are lots of operations in Git that create commits. Even more that create file or tree objects in the file store. The fact that many of these objects are no longer used is irrelevant.

    This has a whole bunch of advantages. In your example, it means that if you decided your rebase was a bad idea, your old commit still exists and can be recovered. There's even a handy syntax for it: topic@{1} refers to the commit that topic pointed to before the last time it moved; here that would be immediately before the rebase.

    The Git object model is clever about this sort of thing. Having an extra commit like this lying around takes up very little extra space. For a rebase like the one you're describing, I'd expect holding on to the old branch would cost at most a few hundred bytes.

    Of course, that does add up over time. So git gc (which is run automatically by certain commands every so often for you) runs git prune . And git prune will look for commits and objects that are old and no longer relevant and clear them out for you.

None of this means your rebase hasn't worked, just that the idea of rebase "moving" commits is a simplification. What rebase actually does is apply the differences between each commit and its parent to the new branch, and creates a new commit with those differences for each commit on the old branch. It then updates the branch such that, if you look at the branch history, it's as if those commits were moved.

Rebase is a command that rewrites the history. But thanks to git your history is not lost. You are able to rollback until the git garbage collector clears those dangling commits.

...the rebased branch is 'moved' on to another one.

That's one way of putting it, but not an entirely accurate one.

The best way to think about a git repo is to think of it as a composition of two things: a directed, acyclic graph of immutable commits, each representing a version of your software (or whatever's in the repo), and a set of branch pointer variables ( master and so on).

Let's say you start with a repo with three commits that looks like this:

a--> b
 \-> c

where the origin/master branch pointer points to b and the master branch pointer points to c . You actually have three different versions of your software here, a , b and c .

If you then decide to rebase c on to b , you will end up with a repo that looks like this:

a--> b--> c'
 \-> c

with the master branch pointer changed to point to c' . "Pushing up this commit" will result in commit c' being sent to the origin repo, the origin repo's master branch pointer being changed to point to c' , and your origin/master branch pointer being changed to match it.

You'll note that c' is a different commit from c , which is still present, and you now have four versions of your software. The c' commit makes morally the same change to b that c did to a (or so one hopes, presuming you edited any conflicts appropriately).

c no longer has any branch pointers pointing to it (well, outside of the reflog, actually), and so will be garbage collected at some point later during normal git operation.

(Git also performs some fancy compression tricks to store all these different [and complete] versions of your software in less space than if they were all individually checked out, but that's not really something you need to, or even should, bother thinking about.)

In casual talk we refer to this operation as "changing the master branch," but really, what you're doing is creating a new branch and changing what master refers to from the old branch to the new.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM