Git: changing committers info

Question

I'm using this script to modify commits:

rm -rf repo

echo "clonning $1"
git clone $1 repo

cd repo
git checkout dev

echo "setting remote origin to $2"
git remote set-url origin $2

array=( 'email1@gmail.com' 'email2@gmail.com' )
for OLD_EMAIL in "${array[@]}"
do
  echo $OLD_EMAIL
  git filter-branch -f --env-filter '
  CORRECT_NAME="New name"
  CORRECT_EMAIL="new@email.com"
  if [ "$GIT_COMMITTER_EMAIL" = '$OLD_EMAIL' ]
  then
      export GIT_COMMITTER_NAME="$CORRECT_NAME"
      export GIT_COMMITTER_EMAIL="$CORRECT_EMAIL"
  fi
  if [ "$GIT_AUTHOR_EMAIL" = '$OLD_EMAIL' ]
  then
      export GIT_AUTHOR_NAME="$CORRECT_NAME"
      export GIT_AUTHOR_EMAIL="$CORRECT_EMAIL"
  fi
  ' --tag-name-filter cat -- --tags
done
echo "Authors list:"
git log --format='%cE' | sort -u
echo -n "Push to destination (y/n)? "
read answer
if echo "$answer" | grep -iq "^y" ;then
    git push
else
    echo Aborted
fi

cd ../

It pulls data from first repo, modifies committers info and pushes to second repo.

The problem arises if someone will commit directly to the second repo. How do i apply those changes to the first repo?

Answer 1

If I'm understanding your question correctly (after reading the comments), your repo currently looks something like this:

The commits in the first repo (ad) have been modified to create the alternate commits (a'-d') which were pushed into a second repo and then had additional commits added, (eg).

Re-editing Your History

Because you don't have a 1:1 relationship between the identity information in both repos, attempting to modify a'-d' with filter-branch in order to restore the original history, while theoretically possible, will require a method that will positively identify the 'original commit' without the one piece of information required to positively identify a commit (its hash).

A commit is basically made up of a few pieces of information:

The hash of the tree
The hash(s) of the commit's parent(s)
The author's identity information
The timestamp of the authoring
The committer's identity information
The timestamp of the commit
The commit message
The size of all that information

All this is hashed to create the unique identifier for your commit. Having altered 2, 3, 5, and 8, we're left with the tree, which is not necessarily unique, the timestamps, which are not necessarily unique, and the commit message, which is not necessarily unique.

Odds are you could get a decent match from just comparing the tree and one of the timestamps, so let's write a little pseudo-code for that scenario.

# create a variable to hold the information from teh current commit
pseudoidentifier=$TREE + $AUTHOR_TIMESTAMP

# go to the first repo
cd /path/to/firstrepo

# output the log | grep to search | sed to remove everything after delimeter
oldhash=`git log --format="{hash}~{tree}{authortimestamp}" | grep pseudoidenfier | sed "s/~.+$//"`

# get the new identity using a custom formatted show command
newidentity=`git show -q --format="{formatted identity}" $oldhash`

# parse out the name and email, probably with sed
CORRECT_NAME=`sed 's/pattern//' $newidentity`
CORRECT_EMAIL=`sed 's/pattern//' $newidentity`

# go to the second repo
cd /path/to/secondrepo

export GIT_COMMITTER_NAME="$CORRECT_NAME"
export GIT_COMMITTER_EMAIL="$CORRECT_EMAIL"

Unfortunately, this would be slow to write and difficult and time-consuming to test. Probably requiring re-running the entire thing multiple times. Since your ultimate goal is to re-unite the code. There are several other options that will likely cause a lot less headache and be a lot faster. Especially if you indeed need to keep the second repo with the identity updates intact.

Alternate Methods

Without a common history, you can still bring the two into sync using somewhat more manual means. Here are three methods I would recommend in this situation.

A little pre-work

Before we begin, we can check to see if the code at d and d' are indeed identical. We can do this by using the git show command:

$ git show -q --format="%T" d
a017285da45ec06fc744815f33a2e22627f4a799
$ git show -q --format="%T" d'
a017285da45ec06fc744815f33a2e22627f4a799

This command will output the tree object the commit points to, if the two trees match, you're dealing with identical code. It is entirely possibly to perform the following procedure without a matching code base, but you're likely to have to resolve conflicts in that situation. This step really just tells you how easily the two will come together.

The Cherry-Pick method

If the repo you used to originally modify the commits is intact, you can fetch the branches from both into a single repo and attempt to use cherry-pick to copy the commits.

git checkout <branch at d>
git cherry-pick d'...g

(Note that the syntax is 3 dots) This will apply the changes from each commit after (but not including) d' up to and including g onto d. Creating new commits e'-g'.

The Patch Method

If you don't have an easy way to bring the changes from both branches into a single repo, you can create a series of patches for the commits on the second repo and apply them to the first.

In the second repo

git checkout <branch of g>
git format-patch --output-directory <dir> d'...g

(Again, the syntax is 3 dots) This will output a series of patch files for each commit after (and not including) d' up to and including g. Then copy these files to where you can get at them from the first repo to apply that patches.

In the first repo

git checkout <branch of d>
git am /path/to/patches/*

You'll end up in the same place you did from the cherry pick method.

Create a Graft

If there are a lot of conflicts and you don't need to keep the identity altered information, you can also use git replace to perform a graft.

git replace --graft e d

This will create a copy of commit e with d as the parent and add a reference that says to use the e' commit whenever it attempts to access e. Effectively making d the common ancestor for both and allowing you to perform a traditional merge (h).

Then what?

Keeping two repos without a common history in sync will consistently cause you problems like this, and they will get worse as the two slowly diverge (for example, as you resolve conflicts). Over time both of these methods will require more and more resources to maintain the two repos.

I would recommend that once the two repos are synchronized, pick one of them and use that one exclusively from then on. If you require two remotes, just push that repo to both of them. You can then easily use any of the many tried and true workflows to maintain the two repos.

If this is not an option, I'd recommend being meticulous about checking the trees of the heads of your two repos to verify that they are bit-for-bit identical frequently.

Answer 2

You've two options to get this done:

If you trust the users, you can have them change their email (either only this git repo or all repos, add --global for all repos)

 git config user.email email@server.com

If you want to enforce it via a pre-commit git hook, that you will add to the second repository and have them all pull the new update. More about this can be found here and here .

Git: changing committers info

Question

2 answers

solution1
6 2017-09-20 03:57:51

Re-editing Your History

Alternate Methods

A little pre-work

The Cherry-Pick method

The Patch Method

In the second repo

In the first repo

Create a Graft

Then what?

solution2
0 2017-09-18 21:02:43

Git: changing committers info

Question

2 answers

solution1 6 2017-09-20 03:57:51

Re-editing Your History

Alternate Methods

A little pre-work

The Cherry-Pick method

The Patch Method

In the second repo

In the first repo

Create a Graft

Then what?

solution2 0 2017-09-18 21:02:43

solution1
6 2017-09-20 03:57:51

solution2
0 2017-09-18 21:02:43