is there a difference in outcome between fast-forward-merge-commit and a cherry-pick from a side branch?

Question

I would like to have a clean git history without any merge commits. The git is being used by a team. two different strategies I thought of:

After a developer has finished working on their side-branch, they squash all of their commits, they rebase against dev and resolve all conflicts, and then they cherry-pick their side-branch into dev and push.
After a developer has finished working on their side-branch, they squash all of their commits, they rebase against dev and resolve all conflicts, and then they merge into dev using fast-forward-merge-commit .

Would there be any difference in the outcome git history on dev ? if yes, why?

Answer 1

Addressing just the title question itself: yes, there is a huge difference:

Cherry-pick works by copying a commit. The new-and-improved copy has a different hash ID, and you now have two commits that "do the same thing" but have different hash IDs.
A fast-forward operation moves a branch name to a commit hash ID. No commits change; no new commits appear; one branch name that used to name some old commit, now names some newer commit. For this operation to be a fast-forward, the old commit (the one the name selected before) must be an ancestor of the new commit (the one the name selects now).

Your two sub-questions are quite different, though, as they involve running various commands in one repository, then running git push to deliver commits from that one repository, to a second repository, followed by asking the second repository to move a branch name.

Would there be any difference in the outcome git history on dev ?

Let me start with three points. We'll need these to make sense of the real answers you need here.

History = commits

Git's history is nothing more or less than the commits in the repository, as found by some name(s). The git log command works by:

finding a commit, or multiple commits, as specified on the command line: the default is the HEAD commit;
putting this commit's hash ID into a queue (specifically a priority queue, but this only matters when there are multiple commits in the queue);

then looping:

take the next commit out of the queue;
print it, or suppress the printing in some fancy cases;
add the commit's parent or parents to the queue, or don't in some fancy cases;
repeating this loop until the queue is empty.

This depends on the parent(s) of each commit to find the history, and is how and why Git works backwards . A branch name simply locates the last commit in some chain of commits. As long as there are no actual merge commits in the chain, and you kick the process off with a single commit, the queue only ever has one commit in it—or zero commits, temporarily, while processing each commit and upon reaching the very first commit ever (which by definition has no parent).

Branch names (and other names) find commits; these names are not shared

The branch names in some repository serve to find commit hash IDs. Each branch name stores one single hash ID. For instance, dev might mean commit a123456... . (The actual hash IDs are big and ugly and nobody can remember them, which is why we have branch names to remember them for us.)

Each repository has its own private names . When you hook two repositories to each other, they get to see each the other's names (to some extent), but they're not required to use the same names. What's shared are the commits themselves . Commits have universal hash IDs: commit a123456... has that hash ID in every Git repository that has the commit, and any Git repository that doesn't have the commit, does not have any commit with hash ID a123456... .

Two Gits, in other words, decide if they both have commit a123456... by checking the ID , not the name. This is important since they may not use the same name for that hash ID. One Git might have six names for that ID, and other Git have only one name. Or, repository R1 might have names fred and barney , and repository R2 have wilma and betty instead. But if they both have commit a123456... , they both use the same hash ID for that particular commit .

In a sense, the hash ID is the commit (except that it's a lot shorter). The hash ID represents the commit, and if your Git doesn't have the hash ID, it doesn't have the commit either; if it does have the hash ID, it has the commit.

Commits store commit hash IDs

To make the whole backwards-chain thing work in the first place, each commit stores the raw hash ID of its predecessor. There are no names involved here, just raw hash IDs:

... <-F <-G <-H

This represents a chain of commits (probably 8 commits, given the letters that come before F :-) ) in a row, where we've replaced the actual random-looking hash IDs with single uppercase letters. The last commit in this chain has hash ID H . It consists of a snapshot of all files, plus some metadata, and the metadata in commit H contains the raw hash ID of earlier commit G .

Commit G contains a snapshot and metadata too. Its metadata gives the hash ID of still-earlier commit F . We say that H points to G , and G points to F . F , of course, points to a still-earlier commit, and so on, backwards. This is the history: some name allows us—or Git—to find commit H , and from there, Git works backwards.

Returning to your question

When developers do work, they're going to make new commits. Exactly how they go about making new commits is not terribly important, but each new commit will have a full snapshot of every file—every commit has that; that's a fundamental part of Git; the snapshots are compressed and de-duplicated so that we don't have to worry about using up the disk drive—and some metadata. The metadata will have their name as author and committer, a date-and-time-stamp, a log message, and so on; and the metadata of each new commit will automatically point back to some older commit.

In other words, they might start out with:

...--G--H   <-- dev

They could then create a new, arbitrary branch name X, or just use the name dev . It doesn't really matter that much now , because they'll have another name, a non-branch name, origin/dev , that finds commit H for now. It could matter later, because as soon as they run git fetch to get any new commits from origin , that might give them a new commit on origin 's dev , and their Git will then update their own origin/dev .

So, let's say they create a branch named X and use git checkout or git switch to make it their current branch , like this:

...--G--H   <-- dev, origin/dev, X (HEAD)

The HEAD notation here shows which branch name is the current branch . It's actually the branch name that selects the commit, so that commit H is the current commit . Regardless of which of these names they use to find H , they will find H right now.

Now they make a new commit (in the usual way). This new commit gets a new, unique (across every Git repository) hash ID: maybe b789abc... but we'll just call it commit I . Their Git writes this commit into their repository database. It makes I have H as parent, because H is the current commit at the moment:

...--G--H
         \
          I

The name dev does not update, because that's not their current branch . So it still points to H . The name origin/dev does not update—only git fetch updates it, and only if the other Git repository has changed its dev branch: origin/dev is our developer's Git's repository's method of remembering some other Git's dev . But the name X , which is the current branch , does change: Git writes I 's hash ID into this name, so that it continues to point to the newest commit:

...--G--H   <-- dev, origin/dev
         \
          I   <-- X (HEAD)

If our developer makes another new commit J , this commit will point back to existing commit I , and now X points to J :

...--G--H   <-- dev, origin/dev
         \
          I--J   <-- X (HEAD)

Assuming that commit J finishes off the work, it's now time for our developer to put his new commit(s) into action. But meanwhile, over on the origin Git, someone else has added new commits. Our developer should run git fetch now. (Note that git fetch is the first part of git pull , but I'm going to suggest that our developer not bother with git pull at all here.)

Let's draw a picture of the corporate repository, which used to go like this:

...--G--H   <-- dev

It now has some new commits. We need letter other than I or J ; let's jump to P and assume there's just one commit:

...--G--H--P   <-- dev

Note that this repository's dev now points to commit P .

Our developer runs git fetch or git fetch origin . (The latter just says where to go for the fetch; if there's only one other repository to call up for fetching, there's no need to say that.) The two Gits hook up, and, since the direction-of-transfer is "from corporate to worker-bee", the only new commit is commit P . Our developer gets commit P in his repository, and his Git updates his origin/dev to point to it:

          P   <-- origin/dev
         /
...--G--H   <-- dev
         \
          I--J   <-- X (HEAD)

Our developer doesn't actually need the name dev at all. I recommend that he just delete it, which lets us draw this more simply:

...--G--H--P   <-- origin/dev
         \
          I--J   <-- X (HEAD)

This is the same drawing; we just took one name away. Our developer doesn't really need to find commit H directly via the name dev . (He can, if he wants to, keep the name so that he can find where he started from, but origin/dev will actually work fine here.)

Our developer now needs to make a new commit that follows commit P . We'll call this commit K , which we made room for. No part of any existing commit can ever be changed! The content for new commit K should be:

metadata: author, committer, log message, etc., as dictated by the developer;
metadata: parent: P
snapshot: the result of merging J and P

There are lots of ways for our developer to get new commit K made . He can:

use git rebase -i or similar to squash together I and J into a new IJ that follows H , then use git rebase to copy IJ to a new K that follows P ;
use git rebase to copy I and J to new commits I' and J' that follow P , then squash I and J together to make K ;
use git merge --squash to make K directly (this might be my personal choice);

or anything else the developer likes. It doesn't matter how our developer gets there. What matters is that he makes commit K :

             K   <--  new-name (HEAD)
            /
...--G--H--P   <-- origin/dev
         \
          I--J   <-- X

This is a picture of the result of creating a new branch name, checking it out, and running git merge --squash X , with:

git switch -c new-name origin/dev
git merge --squash X

But:

git rebase origin/dev

would do this:

             I'-J'  <-- X (HEAD)
            /
...--G--H--P   <-- origin/dev
         \
          I--J   [abandoned]

after which git rebase -i origin/dev and turning pick into squash would do this:

               I'-J'  [abandoned]
              /
             /--K   <-- X (HEAD)
            /
...--G--H--P   <-- origin/dev
         \
          I--J   [abandoned]

No matter how they go about this, as long as they end up with the desired result—a commit, or several commits, that simply add on to commit P —they can now send the new commits:

git push origin new-branch:dev

would be the way to do this with the drawing I made of my make new branch then use git merge --squash method, and:

git push origin X:dev

would be the way to do this with the drawing just above. Either way, the developer has his Git call up the corporate Git and send new commits found by name— new-name or X —that aren't in the corporate Git. Any old IJ commits aren't found by any of these names: they're all abandoned or irrelevant. So only commit K goes over. Then the developer Git asks (politely) that the corporate Git update their dev name, so that their dev would point to commit K .

It's this `git push` that results in a proper fast-forward

This git push —run without --force —sends some commit(s) to the corporate Git server, then sends the polite request: Please, if it's OK, set your name dev to point to this last new commit. The corporate Git server checks to make sure that the new commits add on to the currently-last-commit for that branch name.

If the new commits do just add-on, the corporate Git treats the request as a fast-forward, and does it; it replies: OK, I did that; and the developer's Git updates their origin/dev because they see that the polite request was granted.

If, by whatever coincidences occurred, commit K doesn't add on to the corporate chain—for instance, if commit P got added-on-to by developer13 so that the corporate Git chain now ends at commit Q :

...--H--P--Q   <-- dev
         \
          K   [request]

then the corporate Git says No, I can't do that because it would drop some commit(s) . They don't get specific about commit Q here, they just say no, it's not a fast-forward . The developer's Git then prints a complaint for the developer to see: rejected (non-fast-forward) and does not update his origin/dev . He must run git fetch to obtain commit Q from the corporate Git now, and come up with a new commit—let's call it L or K' —that adds on to Q .

Now, some of your developers might be uncomfortable with the idea that their branch names don't match your corporate branch names. If that's the case, you can let them keep their local dev branch names, and spend a bunch of time and effort updating their dev to match their origin/dev after using git fetch to update their origin/dev . But this really isn't terribly useful. It's a whole bunch of unnecessary running-around. If they learn how to operate Git directly, they can avoid all of this.

If you go through GitHub or other web hosting sites

The big drawback to letting developers push directly like this is a lack of control: you have to trust your developers (not to use git push --force , for instance). If you go through the fancy hosting sites, that have controls like protected branches and forks and so on, you can prevent developers from breaking stuff. This does, however, put more of a burden on you—or some trusted developers—to manage the merges into the main repository. Only you can decide whether this sort of thing is worthwhile.

If you want to do this on your own corporate servers, there are systems like Gitolite that let you stuff similar to what GitHub does, without having to involve GitHub or Bitbucket or GitLab. Gitolite does not have a fancy issue tracker and code review system, though, so the web hosting sites may have other features you'd like.

is there a difference in outcome between fast-forward-merge-commit and a cherry-pick from a side branch?

Question

1 answers

solution1
-1 2021-05-19 00:41:52

History = commits

Branch names (and other names) find commits; these names are not shared

Commits store commit hash IDs

Returning to your question

It's this `git push` that results in a proper fast-forward

If you go through GitHub or other web hosting sites

is there a difference in outcome between fast-forward-merge-commit and a cherry-pick from a side branch?

Question

1 answers

solution1 -1 2021-05-19 00:41:52

History = commits

Branch names (and other names) find commits; these names are not shared

Commits store commit hash IDs

Returning to your question

It's this git push that results in a proper fast-forward

If you go through GitHub or other web hosting sites

solution1
-1 2021-05-19 00:41:52

It's this `git push` that results in a proper fast-forward