从bash中的文本文件中删除特定单词

Question

I want to remove specific words from a txt file in bash. 我想从bash中的txt文件中删除特定单词。 Here is my current script: 这是我当前的脚本：

echo "Sequenzia Import Tag Sidecar Processor v0.2"
echo "=============================================================="
rootfol=$(pwd)
echo "Selecting files from current folder........"
images=$(ls *.jpg *.jpeg *.png *.gif)
echo "Converting sidecar files to folders........"
for file in $images
do
    split -l 8 "$file.txt" tags-
    for block in tags-*
    do
                foldername=$(cat "$rootfol/$block" | tr '\r\n' ' ')
                FOO_NO_EXTERNAL_SPACE="$(echo -e "${foldername}" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//')"
                mkdir "$FOO_NO_EXTERNAL_SPACE" > /dev/null
                cd "$FOO_NO_EXTERNAL_SPACE"
        done
        mv "$rootfol/$file" "$file"
        cd "$rootfol"
        rm tags-* $file.txt
done
echo "DONE! Move files to import folder"

What it does is read the txt file that is named the same as a image and create folders that are interpreted as tags during a import into a Sequenzia image board (based in myimoutobooru) ( https://code.acr.moe/kazari/sequenzia ). 它的作用是读取与图像相同的txt文件，并创建在导入到Sequenzia图像板（基于myimoutobooru）期间被解释为标签的文件夹（ https://code.acr.moe/kazari/ sequenzia ）。 What i want to do is remove specific words (actually there symbol combinations) from the sidecar file so that they do not cause issues with the import process. 我想做的是从sidecar文件中删除特定的单词（实际上有符号组合），以便它们不会导致导入过程中的问题。

Combinations like ">_<" and ":o" i want to remove from the file. 我想从文件中删除“> _ <”和“：o”之类的组合。

What can i add that allows me do this with a list of illegal words considering my current script. 考虑到我当前的脚本，我可以添加些什么，使我可以使用非法单词列表进行此操作。

Answer 1

您可以创建其中列出了您的非法串的文件，并通过文件的行迭代，使用正则表达式来删除您输入像每一个这个。

Answer 2

Before the line "split -l 8 "$file.txt" tags-" I suggest you clean up the $file.txt using something like: 我建议您在“ split -l 8“ $ file.txt”标签-”行之前使用以下方法清理$ file.txt：

sef -f sedscript <"$file.txt" >tempfile

sedscript is a file that you create beforehand containing all your unwanted strings, eg sedscript是您事先创建的文件，其中包含所有不需要的字符串，例如

s/>_<//g
s/:o//g

You'd change your split command to use tempfile. 您可以将split命令更改为使用tempfile。

Experimenting with stdin/stdout on my PC suggests that multiple matches in a sed script are executed in the same pass over the input file. 在我的PC上使用stdin / stdout进行实验表明，sed脚本中的多个匹配项是在输入文件的同一遍中执行的。 Therefore is the file is large, this appraoch avoids reading the file multiple times. 因此是文件很大，这种方法避免了多次读取文件。

another variant of this approach is: 这种方法的另一个变体是：

sed -e s/>_<//g -e s/:o//g <infile >outfile

repeat the 重复

-e s/xxx//g

option as many times as required. 根据需要选择多次。

从bash中的文本文件中删除特定单词

问题描述

2 个解决方案

解决方案1
0 2016-11-16 19:49:34

解决方案2
0 已采纳 2016-11-16 20:20:42

从bash中的文本文件中删除特定单词

问题描述

2 个解决方案

解决方案1 0 2016-11-16 19:49:34

解决方案2 0 已采纳 2016-11-16 20:20:42

解决方案1
0 2016-11-16 19:49:34

解决方案2
0 已采纳 2016-11-16 20:20:42