I have a directory of markdown files that I'm trying to accomplish the following with:
I'm close, but the following code is pulling out the filename of only the first markdown file and applying the variable to all strings in the files. Here's my working code so far:
#!/bin/bash
for file in /home/user/dir/*; do
str="somestring"
filename=$(basename $file)
fn="$(echo "${filename%.*}")"
find ./ -type f -exec sed -i '' -e "s/${str}/${fn}/g" {} \;
done
Assuming that the markdown file looks like this:
123456789.md
and is at /home/user/dir/123456789.md
with several other .md files with other, random numerical names.
Structure of the .md files is similar to:
---
layout: default
date: 2010-03-28
original: /orig/somestring.jpg
thumbnail: /thumb/somestring_thumb.jpg
permalink: /images/somestring/
---
and my goal would be for the script to make each file look like this, based on the filename of the .md file itself:
---
layout: default
date: 2010-03-28
original: /orig/123456789.jpg
thumbnail: /thumb/123456789_thumb.jpg
permalink: /images/123456789/
---
Any thoughts on the best way to edit the sed call, or another way to write this? Occasionally in my testing, sed was returning sed: RE error: illegal byte sequence
, but was going through with the rename of the string anyway, even if it was the wrong string.
Consider utilizing the following solution which is fairly robust. It ensures any character, in either your given search string and/or Markdown filename that may be interpreted as a basic regular expression (BRE) metacharacter, is treated as a literal in the sed
replacement.
#!/usr/bin/env bash
target_dir=/path/to/dir
search='somestring'
search_escaped=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$search")
while read -rd ''; do
base=$(basename -- "$REPLY")
replace_escaped=$(sed 's/[&/\]/\\&/g' <<<"${base%.*}")
sed -i '' -e 's/'"$search_escaped"'/'"$replace_escaped/g"'' "$REPLY"
done < <(find $target_dir -depth 1 -type f -name '*.md' -print0)
Explanation:
The value for the target_dir
variable should be defined as the pathname of the directory that you want to perform a search in. For instance /home/user/dir
as specified in your question.
The value of the search
variable should be changed to the string that you want to search for in your markdown ( .md
) files, and it must be enclosed in single quotes ( '...'
).
The line that reads;
search_escaped=$(sed 's/[^^]/[&]/g; s/\\^/\\\\^/g' <<<"$search")
escapes potential BRE metacharacters that may exist in your search
string and assigns the result to a new variable named search_escaped
.
We do this because ultimately the search string that you define will be used as the search string with sed's s
command , ie s/regexp/replacement/flags
. Essentially each character of your given search
string is placed in its own character set [...]
expression to treat it as a literal, except for caret ( ^
) character(s) as they get escaped as \\^
. Refer to this answer for further details.
This means we can provide a search
string, such as s$om *e[s\\t^ring
, ie one with many metacharacters, and they will treated as literals, and prevent our program from going awry.
Using the find
utility we define the following command to obtain the pathname of all .md
files within the given target_dir
:
find $target_dir -depth 1 -type f -name '*.md' -print0
The -depth 1
part ensures we only find files at the top level. However if you want to recursively descend the given directory tree you can remove it - by removing it you'll also include any .md
files in the sub-directories of the given directory many levels deep.
The -name '*.md'
part ensures that we only include the Markdown files ( .md
) and exclude any other files which may exist in the given target_dir
.
The find
part enclosed in <( ... )
which is referred to as process substitution , and the preceding <
redirects the pathnames found by find
to stdin
.
The while
loop read s the results of the the find
command, ie pathnames of each .md
file found.
In the body of the while
loop we carry out the following tasks:
We obtain the basename from each pathname (Note: $REPLY
is an inbuilt variable associated with while
- in this scenario it holds a reference to a pathname during each turn of the loop):
base=$(basename -- "$REPLY")
The line that reads:
replace_escaped=$(sed 's/[&/\\]/\\\\&/g' <<<"${base%.*}")
escapes what may be perceived by sed
as a placeholder character such as \\1
in the filename. For example; if a file was named somefile\\1\\2\\3.md
that would fail when we replace the search
string with it - however this safeguards against that. Again, refer to this answer for further details.
The ${base%.*}
part utilizes parameter expansion to omit the file extension part (ie .md
) from the value of the base
variable (ie from the filename/basename).
Finally, we replace all instances of the search string (ie the value of the $search_escaped
variable) that may exist in the Markdown file with the value of the replace_escaped
variable (ie the filename without the file extension).
sed -i '' -e 's/'"$search_escaped"'/'"$replace_escaped/g"'' "$REPLY"
Known issue: It's possible for any part of a basename to include newline characters ( \\n
), and whilst this solution does correctly handle the discovery of such a pathname using methods described here - It does not currently perform the string replacement when the filename contains newline characters.
If I'm understanding correctly, the following would work:
#!/bin/bash
for file in /home/user/dir/*; do
str="somestring"
filename=$(basename "$file")
fn=${filename%.*}
LANG=C sed -i '' -e "s/${str}/${fn}/g" "$file"
done
The problem is you are executing find & sed
in the for
loop, which overreplace strings in unrelated files.
LANG=C
prior to sed
would be a common workaround for sed: RE error: illegal byte sequence
problem.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.