简体   繁体   中英

How to extract specific parts of the path and filename in linux

My current task is renaming a whole lot of files across multiple directories to different identifiers.

So I have several directories like: b01, b02, b03, etc. Within each directory is filenames such as img01.23495.png, img01.3596596.png, img02.2399495.png, etc.

I have to rename the img01 of b01 to some other identifier. So the identifier is dependent on the directory name and the first part of the filename.

My thoughts on the pipeline is this: get all the png filenames, extract which folder it is in, extract the img## part, and store the information into a file, so I'd get a file with something like:

b01 img01
b01 img02
b02 img01
...

This is useful so I can specify afterwords what the new identifier is as the third column, then read in the file to perform the actual renaming.

Currently, I have paths such as ./images/something/b01/img01.2342394.png.

To get the list, I am currently trying something like find . | grep png | something sed | sort | uniq > indentifiers.txt find . | grep png | something sed | sort | uniq > indentifiers.txt

I'm stuck on the sed part, however. Also any suggestions to do what I'm trying to do is welcomed as well.

find . -name "*.png" | sed 's#^.*/\([^/]*\)/\([^/.]*\)\.[0-9]\+\.png$#\1 \2#' | sort -u

Sorry I can't get a full test on that - I'm at work and stuck on OSX, which has weird sed issues. Anyway, the core of the solution (besides using the -name test for find and the -u flag for sort ) is the sed Regular Expression. You seem to have a handle, but I'll explain the whole thing in case anyone finds it:

s - Search and Replace
  # - Delimiter (Search pattern)
    ^ - Beginning of a line
    . - Any character
    * - zero or more times
    / - a literal '/'
    \( - start a capturing group
      [^/]* - Any character except '/', zero or more times
    \) - End capturing group (#1)
    / - a literal '/'
    \( - start a capturing group
      [^/.]* - Any character except '/' or '.', zero or more times
    \) - End capturing group (#2)
    \. - a literal '.'
    [0-9] - a digit
    \+ - one or more times
    \.png - a literal '.png'
    $ - end of the line
  # - Delimiter, now starting the replace pattern
    \1 - the contents of the first capturing group
       - a space
    \2 - the contents of the second capturing group
  # - Delimiter.  End of all patterns.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM