How do I rebase a git superproject changing the hashes of the submodules?

Question

Background

Assume we have two git repos, one a submodule of the other ( A will be the superproject, B will be the submodule). Project A is not source code per-se, rather a project that gathers and tracks information about its submodule(s). The A repo rarely, if ever, exists on local machines, rather a bunch of scripts keep it updated.

One day, someone realized that repo B should have been using LFS better and cleaned up the repo using git lfs migrate import . I have a list of B 's old hashes and new hashes.

What I did

As repo A happens to linear (no branching), I was able to do a git rebase --root -i , change all the commits to edit , and run a simple bash script that reset the submodule to the new hashes. Here's an example of the script:

#!/bin/bash
#set the submodule path and input files
submodulePath=foo
newHashesFile=NewHashes.txt
originalHashesFile=OriginalHashes.txt

while [ (test -d "$(git rev-parse --git-path rebase-merge)" || test -d "$(git rev-parse --git-path rebase-apply)" ) ]; do
    numLines=`git ls-files --stage | grep $submodulePath | wc -l`
    if [ $numLines = 1 ];
    then
        oldHash=`git ls-files --stage | grep $submodulePath | sed -e 's/^160000 \([^ ]*\) 0.*$/\1/g'`
        echo oldHash: $oldHash
    else
        echo merge conflict
        oldHash=`git ls-files --stage | grep $submodulePath | grep '^160000 \([^ ]*\) 3.*' | sed -e 's/^160000 \([^ ]*\) 3.*$/\1/g'`
        echo oldHash: $oldHash    
    fi

    lineNumber=`grep -n $oldHash $originalHashesFile | sed -e 's/^\([^:]*\):.*/\1/g'`
    newHash=`head -n $lineNumber $newHashesFile | tail -n 1`

    if [ ! $lineNumber ];
    then
        echo Hash not changed
    else
        cd $submodulePath
        git reset --hard $newHash
        cd ../
    fi

    git add $submodulePath/
    git commit --amend
    git rebase --continue
done

Question

All this worked, but I was wondering if there is an easier simpler way to do so, as I assume I'll be called on to do this again. There are two parts to that question.

Is there a simple way to tell git that you want the default to be edit instead of pick , not dependent on the editor?
Is there a simpler way of telling git to do what the script does? Would it help if I did the git lfs migrate import from within the superproject?

Answer 1

Is there a simple way to tell git that you want the default to be edit instead of pick, not dependent on the editor?

No. There is, however, a way to set the sequence-of-commands editor to a separate editor from other editors: set the environment variable GIT_SEQUENCE_EDITOR . So, for instance, you can do:

GIT_SEQUENCE_EDITOR="sed -i '' s/^pick/edit/" git rebase -i ...

(assuming your sed has a -i that works this way, etc).

Is there a simpler way of telling git to do what the script does?

Given that you want to update each gitlink hash, I'd use git filter-branch (rather than git rebase ) to do it, with an --index-filter that does the gitlink hash updates. I'm not sure this is any simpler but it's more direct. The index filter itself would consist of using git ls-files --stage similar to the way you do it, but probably itself use a generated sed script, or an awk script. Generated-sed would probably be faster, while awk would be simpler, especially if you have a modern awk where you can just read in the hash mapping.

Answer 2

After having to do this a few times over the years, I took torek's advice and wrote my overly verbose bash script as a single git filter-branch . I'm posting it here, both for other users and future me.

First, just to clarify how I did the lfs migrate import (and I'm sure I took the long route for some of these lines):

# Make sure we have the up-to-date remote branches
git submodule update --init SubmodulePath/
cd SubmodulePath/
git fetch --all

# Create local branches that mirror the remote ones
git branch -lr | grep -v "origin/HEAD" | sed 's/^.*origin\///' | 
   xargs -I @ git branch @ origin/@ --force

#Find all files that git identifies as binary and create the lfs migrate command, then run it
git log --all --numstat | grep '^-' | cut -f3 | sed 's|^.*/\(.*\)|\1|' | sed 's|^.*\.\([^.]*\)|\1|' |
   sort -u --ignore-case | sed 's|\([^0-9]\)|[\L\1\U\1]|g' | awk '{print}' ORS=',*.' |
   sed 's|^\(.*\),\*\.$|git lfs migrate import --everything --object-map=LFSImport.txt --include="*.\1"|' | . /dev/stdin

I then moved LFSImport to a different directory (I also committed it to the submodule repo) and ran the filter-branch with index-filter :

git filter-branch -f --index-filter '
   numLines=`git ls-files --stage | grep SubmodulePath | wc -l`
   if [ $numLines = 1 ];
   then
     echo 
     oldHash="$(git rev-parse --quiet --verify :SubmodulePath)"
     echo oldHash: $oldHash
     newHash="$(grep  $oldHash /path/to/LFSImport.txt | cut -d , -f2)"
     echo newHash: $newHash
     git update-index --add --cacheinfo 160000 $newHash SubmodulePath
   fi
   ' HEAD

I probably should have added a check on $newHash to see if it wasn't empty (it was in one commit of mine, but I manually just set it to something else that didn't exist). As torek mentioned, this was cleaner, faster and worked just as well, if not better.

How do I rebase a git superproject changing the hashes of the submodules?

Question

Background

What I did

Question

2 answers

solution1
1 ACCPTED 2019-10-03 15:56:50

solution2
1 2022-11-13 17:58:23

How do I rebase a git superproject changing the hashes of the submodules?

Question

Background

What I did

Question

2 answers

solution1 1 ACCPTED 2019-10-03 15:56:50

solution2 1 2022-11-13 17:58:23

solution1
1 ACCPTED 2019-10-03 15:56:50

solution2
1 2022-11-13 17:58:23