[英]Recursively concatenating (joining) and renaming text files in a directory tree
I am using a Mac OS X Lion. 我正在使用Mac OS X Lion。
I have a folder: LITERATURE
with the following structure: 我有一个文件夹: LITERATURE
具有以下结构:
LITERATURE > Y > YATES, DORNFORD > THE BROTHER OF DAPHNE:
Chapters 01-05.txt
Chapters 06-10.txt
Chapters 11-end.txt
I want to recursively concatenate the chapters that are split into multiple files (not all are). 我想递归地将分成多个文件的章节串联在一起(不是全部)。 Then, I want to write the concatenated file to its parent's parent directory. 然后,我想将串联文件写入其父目录的父目录。 The name of the concatenated file should be the same as the name of its parent directory. 串联文件的名称应与其父目录的名称相同。
For example, after running the script (in the folder structure shown above) I should get the following. 例如,运行脚本之后(在上面显示的文件夹结构中),我应该得到以下内容。
LITERATURE > Y > YATES, DORNFORD:
THE BROTHER OF DAPHNE.txt
THE BROTHER OF DAPHNE:
Chapters 01-05.txt
Chapters 06-10.txt
Chapters 11-end.txt
In this example, the parent directory is THE BROTHER OF DAPHNE
and the parent's parent directory is YATES, DORNFORD
. 在此示例中,父目录是THE BROTHER OF DAPHNE
,父目录的父目录是YATES, DORNFORD
。
[Updated March 6th—Rephrased the question/answer so that the question/answer is easy to find and understand.] [3月6日更新-改写了问题/答案,使问题/答案易于查找和理解。]
It's not clear what you mean by "recursively" but this should be enough to get you started. 目前尚不清楚“递归”是什么意思,但这足以使您入门。
#!/bin/bash
titlecase () { # adapted from http://stackoverflow.com/a/6969886/874188
local arr
arr=("${@,,}")
echo "${arr[@]^}"
}
for book in LITERATURE/?/*/*; do
title=$(titlecase ${book##*/})
for file in "$book"/*; do
cat "$file"
echo
done >"$book/$title"
echo '# not doing this:' rm "$book"/*.txt
done
This loops over LITERATURE
/ initial / author / BOOK TITLE and creates a file Book Title
(where should a space be added?) from the catenated files in each book directory. 这会遍历LITERATURE
/ 初始 / 作者 / 书名,并从每个书目录中的链接文件中创建一个文件Book Title
(应在哪里添加空格?)。 (I would generate it in the parent directory and then remove the book directory completely, assuming it contains nothing of value any longer.) There is no recursion, just a loop over this directory structure. (我将在父目录中生成它,然后假定它不再包含任何值,然后完全删除book目录。)没有递归,只有一个遍历此目录结构的循环。
Removing the chapter files is a bit risky so I'm not doing it here. 删除章节文件有点冒险,所以我不在这里做。 You could remove the echo
prefix from the line after the first done
to enable it. 你可以去除echo
后的第一个从线前缀done
启用它。
If you have book names which contain an asterisk or some other shell metacharacter this will be rather more complex -- the title
assignment assumes you can use the book title unquoted. 如果您的书名包含星号或其他外壳元字符,则情况会更加复杂- title
分配假定您可以使用未引用的书名。
Only the parameter expansion with case conversion is beyond the very basics of Bash. 仅参数转换和大小写转换超出了Bash的基础。 The array operations could perhaps also be a bit scary if you are a complete beginner. 如果您是一个完整的初学者,那么阵列操作也可能会有些吓人。 Proper understanding of quoting is also often a challenge for newcomers. 对于新手来说,正确理解报价通常也是一个挑战。
cat Chapters*.txt > FinaleFile.txt.raw
Chapters="$( ls -1 Chapters*.txt | sed -n 'H;${x;s/\
//g;s/ *Chapters //g;s/\.txt/ /g;s/ *$//p;}' )"
mv FinaleFile.txt.raw "FinaleFile ${Chapters}.txt"
Thanks for all your input. 感谢您的输入。 They got me thinking, and I managed to concatenate the files using the following steps: 他们让我思考,我设法使用以下步骤将文件连接起来:
#!/bin/bash
# We are going to iterate through the directory tree, up to a maximum depth of 20.
for i in `seq 1 20`
do
# In UNIX based systems, files and directories are the same (Everything is a File!).
# The 'find' command lists all files which contain spaces in its name. The | (pipe) …
# … forwards the list to a 'while' loop that iterates through each file in the list.
find . -name '* *' -maxdepth $i | while read file
do
# Here, we use 'sed' to replace spaces in the filename with underscores.
# The 'echo' prints a message to the console before renaming the file using 'mv'.
item=`echo "$file" | sed 's/ /_/g'`
echo "Renaming '$file' to '$item'"
mv "$file" "$item"
done
done
#!/bin/bash
# Here, we go through all the directories (up to a depth of 20).
for D in `find . -maxdepth 20 -type d`
do
# Check if the parent directory contains any files of interest.
if ls $D/Part*.txt &>/dev/null ||
ls $D/Chapter*.txt &>/dev/null ||
ls $D/Section*.txt &>/dev/null ||
ls $D/Book*.txt &>/dev/null
then
# If we get here, then there are split files in the directory; we will concatenate them.
# First, we trim the full directory path ($D) so that we are left with the path to the …
# … files' parent's parent directory—We will write the concatenated file here. (✝)
ppdir="$(dirname "$D")"
# Here, we concatenate the files using 'cat'. The 'awk' command extracts the name of …
# … the parent directory from the full directory path ($D) and gives us the filename.
# Finally, we write the concatenated file to its parent's parent directory. (✝)
cat $D/*.txt > $ppdir/`echo $D|awk -F'/' '$0=$(NF-0)'`.txt
fi
done
Now, we delete all the files that we concatenated so that its parent directory is left empty. 现在,我们删除所有串联的文件,以便其父目录保留为空。
find . -name 'Part*' -delete
find . -name 'Chapter*' -delete
find . -name 'Section*' -delete
find . -name 'Book*' -delete
The following command will delete empty directories. 以下命令将删除空目录。 (✝) We wrote the concatenated file to its parent's parent directory so that its parent directory is left empty after deleting all the split files. (✝)我们将串联文件写入其父目录的父目录,以便在删除所有拆分文件后将其父目录留空。
find . -type d -empty -delete
[Updated March 6th—Rephrased the question/answer so that the question/answer is easy to find and understand.] [3月6日更新-改写了问题/答案,使问题/答案易于查找和理解。]
Shell doesn't like white space in names. Shell不喜欢名称中的空格。 However, over the years, Unix has come up with some tricks that'll help: 但是,多年来,Unix提出了一些技巧,这些技巧将有所帮助:
$ find . -name "Chapters*.txt" -type f -print0 | xargs -0 cat >> final_file.txt
Might do what you want. 可能会做您想要的。
The find
recursively finds all of the directory entries in a file tree that matches the query (In this case, the type must be a file, and the name matches the pattern Chapter*.txt
). 该find
递归方式在与查询匹配的文件树中找到所有目录条目(在这种情况下,类型必须是文件,并且名称与模式Chapter*.txt
匹配)。
Normally, find
separates out the directory entry names with NL, but the -print0
says to separate out the entries names with the NUL
character. 通常, find
用NL分隔目录条目名称,但-print0
表示使用NUL
字符分隔目录名称。 The NL
is a valid character in a file name, but NUL
isn't. NL
是文件名中的有效字符,但NUL
不是。
The xargs
command takes the output of the find
and processes it. xargs
命令获取find
的输出并对其进行处理。 xargs
gathers all the names and passes them in bulk to the command you give it -- in this case the cat
command. xargs
收集所有名称,并将它们批量传递给您提供的命令-在本例中为cat
命令。
Normally, xargs
separates out files by white space which means Chapters
would be one file and 01-05.txt
would be another. 通常, xargs
用空格分隔文件,这意味着Chapters
将是一个文件,而01-05.txt
将是另一个文件。 However, the -0
tells xargs
, to use NUL
as a file separator -- which is what -print0
does. 但是, -0
告诉xargs
使用NUL
作为文件分隔符-这是-print0
作用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.