How to refer to the current path in this recursive find and replace?

Question

Disclaimer : (off-topic warning) This is not about outputting the list of ignored files actually detected in the repo. This is about ignored paths , even when no file is in fact matching one of these paths.

Context : I'm attempting to write a git alias to "flatten" all .gitignore patterns recursively and output a list of paths as they're seen from the top level .

What I mean with an example:

├─ .git
├─ .gitignore
└─ dir1
    ├─ .gitignore
    ├─ file1.txt
    └─ file2.txt

With these contents in .gitignore files:

# (currently pointing at top-level directory)
$ cat .gitignore
some_path

$ cat dir1/.gitignore
yet_another_path
*.txt

I try to have an alias to output something along the lines of

$ git flattened-ignore-list
some_path
dir1/yet_another_path
dir1/*.txt

What do I have so far?

I know I can search for all .gitignore files in the repo with

find . -name ".gitignore"

which in this case would output

.gitignore
dir1/.gitignore

So I've tried to combine this with cat to get their contents (either of these work)

find . -name ".gitignore" | xargs cat
# or
cat $(find . -name ".gitignore")

with this result:

some_path
yet_another_path
*.txt

which is technically expected but unfortunately unhelpful for what I am trying to achieve. So to (at last!) arrive at my actual question:

How can I, for each result of find , refer to the current path? (in order to eventually prepend it to the line)

Note for people suspecting an XY problem : It might be the case, my approach might just be naive here, but maybe not, I'm unsure. For example I didn't consider complex cases where nested .gitignore files could refer to upper-levels, or special syntax with ** . I've stuck to very simple structures for now, so in case you see a flaw and/or can suggest a totally different way to achieve the same goal, I'll of course be happy to hear about it also.

Answer 1

I try to have an alias to output something along the lines of
 $ git flattened-ignore-list some_path dir1/yet_another_path dir1/*.txt 

Unfortunately, this approach is naive (and perhaps doomed, but maybe not) because entries in .gitignore files are a bit complicated.

The simple answer to the simple question you asked is to use something that prepends the directory name, relative to the top level. Since find never outputs unnecessarily-complicated names, you can do this with direct string processing:

 .gitignore dir1/.gitignore

tells you that when reading the first file, prepend nothing, and when reading the second, prepend dir1 to each entry. Doing this in shell is a little tricky, but bash has the tools needed: you just get the line minus the /.gitignore at the end, either using regexp replacement or just removing 11 characters (if I counted right) from anything that has a slash in it or isn't the literal 10-character string .gitignore . Grab the directory off the part before the /.gitignore name and use sed or awk to insert it, and a slash, in front of non-comment entries (and remember to handle ! entries a little differently).

You are probably better off handling the top level .gitignore separately–you can just copy it straight through, adding a final newline if necessary—and then dealing with subdirectory .gitignore s in a different code path.

Note that a subdirectory .gitignore cannot refer to something above it: nothing in dir1/.gitignore can change whether ./foo or dir2/foo is ignored or not. So that part is not a problem.

The part that is a problem is that, in dir1 , the entry:

*.txt

implies that the top level should not only ignore untracked dir1/*.txt files, but also ignore dir1/sub/*.txt files, dir1/sub/sub2/*.txt , and so on. However, a dir1 entry reading:

sub/*.txt

means that the top level should ignore only untracked dir1/sub/*.txt files, without ignoring any dir1/sub/sub2/*.txt files!

You may be able to salvage this with yet more code: while reading a subdirectory .gitignore , check to see if there are embedded slashes in any given line. An embedded slash is one that is not the final slash, because final slashes are removed for this particular differentiation.

If the entry contains an embedded slash, it applies only to the full-path-relative-to-the-subdirectory. You can therefore add dir1/ in front and be done, eg:
```
 dir1/foo/*.txt 
```
If the entry does not contain an embedded slash, it applies to the subdirectory and all of its nested sub-subdirectories. You will need to allow for any arbitrary number of subdirectories. This might be correct, but it's quite untested:
```
 dir1/*.txt dir1/**/*.txt 
```
(In theory **/ should also match the empty list of subdirectories, so only the second line should be needed, but in practice I have seen this not happen for some cases. I do not recall whether this was in other pathspecs, .gitignore files, or both.)

In general, most .gitignore entries seem not to contain embedded slashes, so any successful script you write will probably produce a nearly double-length "flattened" ignore file, compared to its input length.

Answer 2

You can produce a complete list of ignore patterns, with directory prefix like this:

#!usr/bin/env sh

find \
  . \
  -type f \
  -name '.gitignore' \
  -printf '%h\n' \
  | while IFS= read -r dir_name; do
    printf \
      "${dir_name}/%s\\n" \
      $(
        sed \
          --silent \
          '/^[^#[:space:]]/p' \
          "$dir_name/.gitignore"
      )
  done

The above code will just list all patterns found in .gitignore files across directories, and add the directory as prefix of each pattern.

It does not reflect gitignore syntax and behavior that is described here in git documentation: https://git-scm.com/docs/gitignore

How to refer to the current path in this recursive find and replace?

Question

2 answers

solution1
2 2019-08-04 19:57:34

solution2
2 2019-08-04 21:42:35

How to refer to the current path in this recursive find and replace?

Question

2 answers

solution1 2 2019-08-04 19:57:34

solution2 2 2019-08-04 21:42:35

solution1
2 2019-08-04 19:57:34

solution2
2 2019-08-04 21:42:35