Squash master branch history but keep all prior commit messages in blame?

Question

I have been researching git squash but am unsure if it would even apply to what I am trying to do. I only know the basics of git so maybe this is a ridiculous thing to do.

I have a master branch that has lets say 10,000 commits on it. Lets assume it looks like this:

1-2-3...5000...9999-10000

Lets assume all files in the repo have been modified at some points in time. For example, "test.php" file has commits at 1,2,3,2500,6000 .

Now what I want to do is to make the entire history of the master branch start at commit 5000 but keep the commit logs intact? Is this even possible?

For example using blame on "test.php":

<commit 1> | echo 'hello world';
<commit 2> | echo 'another line in the file';
<commit 6000> | echo 'sometime later';

My thought of why I want to do this is simple. At this point I will never rollback the code beyond commit 5000 but it would be great to see who did the change. It will also reduce the size of a checkout which at this point is very large.

Answer 1

Git gets the information it shows when annotating lines in source files from the commit history. So if you get rid of parts of the commit history, that information isn't available to git anymore.

Just because you don't want to roll back before some commit isn't a reason to give up the history. There are many more reasons to keep it, with git annotate being just one of them.

So the only problem you are trying to solve seems to be the amount of data that needs to be transferred when cloning. You can reduce this by using the --depth option to git clone to create a shallow clone. This way, the history will still be available in some remote, but you choose yourself how much of the history you want to copy to your clone.

A shallow clone is also a good way to determine how much space you could save by squashing the history. Note that --depth saves space in two different ways: It clones only the single branch HEAD on the remote is currently pointing to, and it clones that branch only to a certain depth. You can use the --no-single-branch option in addtion to get more comparable numbers to judge whether it's worth it to squash the history. Most often, it's not.

To test the effect of --depth locally, you could do

git clone --no-local --no-hardlinks --no-single-branch --depth 100 path/to/repository path/to/clone

This will create a shallow clone of your local repository while overriding the usual local optimisations. You can then compare the total space consumption using

du -sm path/to/repository
du -sm path/to/clone

Answer 2

Squashing commits will rewrite history and you will be the one who made that change and you will be the one appearing in blame. If a thousand people made changes and one person decided to squish all of those changes into one big change, who made that one big change? The one person who squished everything. Git forgot about changes beyond that because the history was rewritten.

In other words, that is not possible.

Squash master branch history but keep all prior commit messages in blame?

Question

2 answers

solution1
5 ACCPTED 2015-09-09 21:23:54

solution2
3 2015-09-09 21:10:52

Squash master branch history but keep all prior commit messages in blame?

Question

2 answers

solution1 5 ACCPTED 2015-09-09 21:23:54

solution2 3 2015-09-09 21:10:52

solution1
5 ACCPTED 2015-09-09 21:23:54

solution2
3 2015-09-09 21:10:52