简体   繁体   English

递归级联(合并)和重命名目录树中的文本文件

[英]Recursively concatenating (joining) and renaming text files in a directory tree

I am using a Mac OS X Lion. 我正在使用Mac OS X Lion。

I have a folder: LITERATURE with the following structure: 我有一个文件夹: LITERATURE具有以下结构:

LITERATURE > Y > YATES, DORNFORD > THE BROTHER OF DAPHNE:
  Chapters 01-05.txt
  Chapters 06-10.txt
  Chapters 11-end.txt

I want to recursively concatenate the chapters that are split into multiple files (not all are). 我想递归地将分成多个文件的章节串联在一起(不是全部)。 Then, I want to write the concatenated file to its parent's parent directory. 然后,我想将串联文件写入其父目录的父目录。 The name of the concatenated file should be the same as the name of its parent directory. 串联文件的名称应与其父目录的名称相同。

For example, after running the script (in the folder structure shown above) I should get the following. 例如,运行脚本之后(在上面显示的文件夹结构中),我应该得到以下内容。

LITERATURE > Y > YATES, DORNFORD:
  THE BROTHER OF DAPHNE.txt
  THE BROTHER OF DAPHNE:
    Chapters 01-05.txt
    Chapters 06-10.txt
    Chapters 11-end.txt

In this example, the parent directory is THE BROTHER OF DAPHNE and the parent's parent directory is YATES, DORNFORD . 在此示例中,父目录是THE BROTHER OF DAPHNE ,父目录的父目录是YATES, DORNFORD


[Updated March 6th—Rephrased the question/answer so that the question/answer is easy to find and understand.] [3月6日更新-改写了问题/答案,使问题/答案易于查找和理解。]

It's not clear what you mean by "recursively" but this should be enough to get you started. 目前尚不清楚“递归”是什么意思,但这足以使您入门。

#!/bin/bash

titlecase () {  # adapted from http://stackoverflow.com/a/6969886/874188
    local arr
    arr=("${@,,}")
    echo "${arr[@]^}"
}

for book in LITERATURE/?/*/*; do
    title=$(titlecase ${book##*/})
    for file in "$book"/*; do
        cat "$file"
        echo
    done >"$book/$title"
    echo '# not doing this:' rm "$book"/*.txt
done

This loops over LITERATURE / initial / author / BOOK TITLE and creates a file Book Title (where should a space be added?) from the catenated files in each book directory. 这会遍历LITERATURE / 初始 / 作者 / 书名,并从每个书目录中的链接文件中创建一个文件Book Title (应在哪里添加空格?)。 (I would generate it in the parent directory and then remove the book directory completely, assuming it contains nothing of value any longer.) There is no recursion, just a loop over this directory structure. (我将在父目录中生成它,然后假定它不再包含任何值,然后完全删除book目录。)没有递归,只有一个遍历此目录结构的循环。

Removing the chapter files is a bit risky so I'm not doing it here. 删除章节文件有点冒险,所以我不在这里做。 You could remove the echo prefix from the line after the first done to enable it. 你可以去除echo后的第一个从线前缀done启用它。

If you have book names which contain an asterisk or some other shell metacharacter this will be rather more complex -- the title assignment assumes you can use the book title unquoted. 如果您的书名包含星号或其他外壳元字符,则情况会更加复杂- title分配假定您可以使用未引用的书名。

Only the parameter expansion with case conversion is beyond the very basics of Bash. 参数转换和大小写转换超出了Bash的基础。 The array operations could perhaps also be a bit scary if you are a complete beginner. 如果您是一个完整的初学者,那么阵列操作也可能会有些吓人。 Proper understanding of quoting is also often a challenge for newcomers. 对于新手来说,正确理解报价通常也是一个挑战。

cat Chapters*.txt > FinaleFile.txt.raw
Chapters="$( ls -1 Chapters*.txt | sed -n 'H;${x;s/\
//g;s/ *Chapters //g;s/\.txt/ /g;s/ *$//p;}' )"
mv FinaleFile.txt.raw "FinaleFile ${Chapters}.txt"
  • cat all txt at once (assuming name sorted list) 一次处理所有txt(假设名称排序列表)
  • take chapter number/ref from the ls of the folder and with a sed to adapt the format 从文件夹的ls中获取章节编号/参考,并使用sed修改格式
  • rename the concatenate file including chapters 重命名包含章节的连接文件

Thanks for all your input. 感谢您的输入。 They got me thinking, and I managed to concatenate the files using the following steps: 他们让我思考,我设法使用以下步骤将文件连接起来:


  1. This script replaces spaces in filenames with underscores. 该脚本用下划线替换文件名中的空格。

#!/bin/bash

# We are going to iterate through the directory tree, up to a maximum depth of 20.
for i in `seq 1 20`
  do

# In UNIX based systems, files and directories are the same (Everything is a File!).
# The 'find' command lists all files which contain spaces in its name. The | (pipe) …
# … forwards the list to a 'while' loop that iterates through each file in the list.
    find . -name '* *' -maxdepth $i | while read file
    do

# Here, we use 'sed' to replace spaces in the filename with underscores.
# The 'echo' prints a message to the console before renaming the file using 'mv'.
      item=`echo "$file" | sed 's/ /_/g'`
      echo "Renaming '$file' to '$item'"
      mv "$file" "$item"
    done
done

  1. This script concatenates text files that start with Part, Chapter, Section, or Book. 该脚本将以“零件”,“章”,“节”或“书”开头的文本文件连接在一起。

#!/bin/bash

# Here, we go through all the directories (up to a depth of 20).
for D in `find . -maxdepth 20 -type d`
do

# Check if the parent directory contains any files of interest.
    if ls $D/Part*.txt &>/dev/null ||
       ls $D/Chapter*.txt &>/dev/null ||
       ls $D/Section*.txt &>/dev/null ||
       ls $D/Book*.txt &>/dev/null
      then

# If we get here, then there are split files in the directory; we will concatenate them.
# First, we trim the full directory path ($D) so that we are left with the path to the …
# … files' parent's parent directory—We will write the concatenated file here. (✝)
        ppdir="$(dirname "$D")"

# Here, we concatenate the files using 'cat'. The 'awk' command extracts the name of …
# … the parent directory from the full directory path ($D) and gives us the filename.
# Finally, we write the concatenated file to its parent's parent directory. (✝)
        cat $D/*.txt > $ppdir/`echo $D|awk -F'/' '$0=$(NF-0)'`.txt
    fi
done

  1. Now, we delete all the files that we concatenated so that its parent directory is left empty. 现在,我们删除所有串联的文件,以便其父目录保留为空。

    • find . -name 'Part*' -delete
    • find . -name 'Chapter*' -delete
    • find . -name 'Section*' -delete
    • find . -name 'Book*' -delete

  1. The following command will delete empty directories. 以下命令将删除空目录。 (✝) We wrote the concatenated file to its parent's parent directory so that its parent directory is left empty after deleting all the split files. (✝)我们将串联文件写入其父目录的父目录,以便在删除所有拆分文件后将其父目录留空。

    • find . -type d -empty -delete

[Updated March 6th—Rephrased the question/answer so that the question/answer is easy to find and understand.] [3月6日更新-改写了问题/答案,使问题/答案易于查找和理解。]

Shell doesn't like white space in names. Shell不喜欢名称中的空格。 However, over the years, Unix has come up with some tricks that'll help: 但是,多年来,Unix提出了一些技巧,这些技巧将有所帮助:

$ find . -name "Chapters*.txt" -type f -print0 | xargs -0 cat >> final_file.txt

Might do what you want. 可能会做您想要的。

The find recursively finds all of the directory entries in a file tree that matches the query (In this case, the type must be a file, and the name matches the pattern Chapter*.txt ). find递归方式在与查询匹配的文件树中找到所有目录条目(在这种情况下,类型必须是文件,并且名称与模式Chapter*.txt匹配)。

Normally, find separates out the directory entry names with NL, but the -print0 says to separate out the entries names with the NUL character. 通常, find用NL分隔目录条目名称,但-print0表示使用NUL字符分隔目录名称。 The NL is a valid character in a file name, but NUL isn't. NL是文件名中的有效字符,但NUL不是。

The xargs command takes the output of the find and processes it. xargs命令获取find的输出并对其进行处理。 xargs gathers all the names and passes them in bulk to the command you give it -- in this case the cat command. xargs收集所有名称,并将它们批量传递给您提供的命令-在本例中为cat命令。

Normally, xargs separates out files by white space which means Chapters would be one file and 01-05.txt would be another. 通常, xargs用空格分隔文件,这意味着Chapters将是一个文件,而01-05.txt将是另一个文件。 However, the -0 tells xargs , to use NUL as a file separator -- which is what -print0 does. 但是, -0告诉xargs使用NUL作为文件分隔符-这是-print0作用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM