简体   繁体   中英

Looping through files in a dir; Pulling out filenames to replace string(s) in existing files

I have a directory of markdown files that I'm trying to accomplish the following with:

  • Grab the filename of the markdown file and store it in a variable
  • Take that variable and replace a series of strings in the file with the stored filename variable
  • loop through all of the files in the directory and do the same thing

I'm close, but the following code is pulling out the filename of only the first markdown file and applying the variable to all strings in the files. Here's my working code so far:

#!/bin/bash

for file in /home/user/dir/*; do

  str="somestring"
  filename=$(basename $file)
  fn="$(echo "${filename%.*}")"

  find ./ -type f -exec sed -i '' -e "s/${str}/${fn}/g" {} \;

done

Assuming that the markdown file looks like this:

123456789.md and is at /home/user/dir/123456789.md with several other .md files with other, random numerical names.

Structure of the .md files is similar to:

---
layout: default
date: 2010-03-28
original: /orig/somestring.jpg
thumbnail: /thumb/somestring_thumb.jpg
permalink: /images/somestring/
---

and my goal would be for the script to make each file look like this, based on the filename of the .md file itself:

---
layout: default
date: 2010-03-28
original: /orig/123456789.jpg
thumbnail: /thumb/123456789_thumb.jpg
permalink: /images/123456789/
---

Any thoughts on the best way to edit the sed call, or another way to write this? Occasionally in my testing, sed was returning sed: RE error: illegal byte sequence , but was going through with the rename of the string anyway, even if it was the wrong string.

Consider utilizing the following solution which is fairly robust. It ensures any character, in either your given search string and/or Markdown filename that may be interpreted as a basic regular expression (BRE) metacharacter, is treated as a literal in the sed replacement.

Solution:

#!/usr/bin/env bash

target_dir=/path/to/dir
search='somestring'

search_escaped=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$search")

while read -rd ''; do
  base=$(basename -- "$REPLY")
  replace_escaped=$(sed 's/[&/\]/\\&/g' <<<"${base%.*}")
  sed -i '' -e 's/'"$search_escaped"'/'"$replace_escaped/g"'' "$REPLY"
done < <(find $target_dir -depth 1 -type f -name '*.md' -print0)

Explanation:

  • The value for the target_dir variable should be defined as the pathname of the directory that you want to perform a search in. For instance /home/user/dir as specified in your question.

  • The value of the search variable should be changed to the string that you want to search for in your markdown ( .md ) files, and it must be enclosed in single quotes ( '...' ).

  • The line that reads;

     search_escaped=$(sed 's/[^^]/[&]/g; s/\\^/\\\\^/g' <<<"$search") 

    escapes potential BRE metacharacters that may exist in your search string and assigns the result to a new variable named search_escaped .

    We do this because ultimately the search string that you define will be used as the search string with sed's s command , ie s/regexp/replacement/flags . Essentially each character of your given search string is placed in its own character set [...] expression to treat it as a literal, except for caret ( ^ ) character(s) as they get escaped as \\^ . Refer to this answer for further details.

    This means we can provide a search string, such as s$om *e[s\\t^ring , ie one with many metacharacters, and they will treated as literals, and prevent our program from going awry.

  • Using the find utility we define the following command to obtain the pathname of all .md files within the given target_dir :

     find $target_dir -depth 1 -type f -name '*.md' -print0 
    • The -depth 1 part ensures we only find files at the top level. However if you want to recursively descend the given directory tree you can remove it - by removing it you'll also include any .md files in the sub-directories of the given directory many levels deep.

    • The -name '*.md' part ensures that we only include the Markdown files ( .md ) and exclude any other files which may exist in the given target_dir .

    • The find part enclosed in <( ... ) which is referred to as process substitution , and the preceding < redirects the pathnames found by find to stdin .

  • The while loop read s the results of the the find command, ie pathnames of each .md file found.

    In the body of the while loop we carry out the following tasks:

    • We obtain the basename from each pathname (Note: $REPLY is an inbuilt variable associated with while - in this scenario it holds a reference to a pathname during each turn of the loop):

       base=$(basename -- "$REPLY") 
    • The line that reads:

       replace_escaped=$(sed 's/[&/\\]/\\\\&/g' <<<"${base%.*}") 

      escapes what may be perceived by sed as a placeholder character such as \\1 in the filename. For example; if a file was named somefile\\1\\2\\3.md that would fail when we replace the search string with it - however this safeguards against that. Again, refer to this answer for further details.

      The ${base%.*} part utilizes parameter expansion to omit the file extension part (ie .md ) from the value of the base variable (ie from the filename/basename).

    • Finally, we replace all instances of the search string (ie the value of the $search_escaped variable) that may exist in the Markdown file with the value of the replace_escaped variable (ie the filename without the file extension).

       sed -i '' -e 's/'"$search_escaped"'/'"$replace_escaped/g"'' "$REPLY" 

Known issue: It's possible for any part of a basename to include newline characters ( \\n ), and whilst this solution does correctly handle the discovery of such a pathname using methods described here - It does not currently perform the string replacement when the filename contains newline characters.

If I'm understanding correctly, the following would work:

#!/bin/bash

for file in /home/user/dir/*; do

    str="somestring"
    filename=$(basename "$file")
    fn=${filename%.*}

    LANG=C sed -i '' -e "s/${str}/${fn}/g" "$file"

done

The problem is you are executing find & sed in the for loop, which overreplace strings in unrelated files.
LANG=C prior to sed would be a common workaround for sed: RE error: illegal byte sequence problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM