简体   繁体   English

搜索文本并将其追加到文本文件的每一行末尾-OSX

[英]Search text and append to each end of line of text file - OSX

I'm new to OSX command line tools. 我是OSX命令行工具的新手。

I am trying to find a block of text in a file and append this text at the end of all lines in another text file. 我试图在文件中找到一个文本块,并将此文本附加到另一个文本文件中所有行的末尾。 At run time I don't know what this text will be, I just know it will be located within "BEGINHMM" and "ENDHMM". 在运行时,我不知道该文本是什么,我只知道它将位于“ BEGINHMM”和“ ENDHMM”中。 Also, I don't know the makeup of the destination file, except for that it will not be an empty text file. 另外,我不知道目标文件的构成,只是它不会是一个空文本文件。

The command which finds the block of text of interest is: 查找感兴趣的文本块的命令是:

sed -n '/<BEGINHMM>/,/<ENDHMM>/p' proto

where "proto" is a text file containing the text of interest. 其中“ proto”是一个包含感兴趣文本的文本文件。

I've been trying to pipe the output of the above command to another 'sed' command, in the following manner: 我一直在尝试通过以下方式将上述命令的输出传递给另一个“ sed”命令:

xargs -I '{}' sed -i .bak 's/$/{}/' monophones0.txt 

but I am getting some bizarre results, I see the "{}" inserted in the text for example. 但是我得到了一些奇怪的结果,例如,我看到在文本中插入了“ {}”。

I've also tried piping to: 我还尝试了管道传输:

xargs -0 sed -i .bak 's/$/&/' monophones0.txt

but I just get the printout (similar to terminal echo) of the text I am trying to grab. 但是我只是得到了我要抓取的文本的打印输出(类似于终端回显)。

Ultimately I want to loop over several 'proto' files in multiple directories and copy the text between the "BEGINHMM", "ENDHMM" block in each directory, and append the selected text to that directory's monophones.txt lines. 最终,我想循环遍历多个目录中的几个“ proto”文件,并在每个目录中的“ BEGINHMM”,“ ENDHMM”块之间复制文本,并将所选文本附加到该目录的monophones.txt行中。

I am running the commands in the terminal, bash, OSX 10.12.2 我在终端,bash,OSX 10.12.2中运行命令

Any help would be appreciated. 任何帮助,将不胜感激。

(1) Your sed command is of the form sed -n '/A/,/B/p' ; (1)您的sed命令的格式为sed -n '/A/,/B/p' ; this will include the lines on which A and B occur, even if these strings do not appear at the beginning of the line. 即使这些字符串未出现在行的开头,这也将包括A和B所在的行。 This form may have other surprises in store for you as well (what do expect will happen if B is missing or repeated?), but the remainder of this post assumes that's what you want. 这种形式可能还会给您带来其他惊喜(如果缺少B或重复出现B,会发生什么?),但是本篇文章的其余部分假定您正是您想要的。

(2) It's not clear how you intend to specify the "proto" files, but you do indicate they might be in several directories, so for the remainder of this post, I'll assume they are listed, one per line, in a file named proto.txt in each directory. (2)目前尚不清楚您打算如何指定“ proto”文件,但您确实指出它们可能位于多个目录中,因此在本文的其余部分中,我将假定它们在每行中列出。每个目录中名为proto.txt的文件。 This will ensure that you don't run into any limitations on command-line length, but the following can easily be modified if you don't want to create such a file. 这将确保您不会遇到命令行长度的任何限制,但是如果您不想创建这样的文件,则可以轻松修改以下内容。

(3) Here is a script which will use the sed command you've mentioned to copy segments from each of the "proto" files specified in a directory to monophones0.txt in the directory in which the script is executed. (3)这是一个脚本,它将使用您提到的sed命令将片段从目录中指定的每个“ proto”文件复制到执行该脚本的目录中的monophones0.txt。

#!/bin/bash

OUT=monophones0.txt

cat proto.txt | while read file
do
  if [ -r "$file" ] ; then
    sed -n '/<BEGINHMM>/,/<ENDHMM>/p' "$file" >> $OUT
  elif [ -n "$file" ] ; then
    echo "NOT FOUND: $file" >&2
  fi
done    

Just like what you did before. 就像您之前所做的一样。 tmpfile=$(mktemp); sed -n '/<BEGINHMM>/,/<ENDHMM>/p' proto >$tmpfile; sed -i .bak "r $tmpfile" monophones0.txt; rm $tmpfile tmpfile=$(mktemp); sed -n '/<BEGINHMM>/,/<ENDHMM>/p' proto >$tmpfile; sed -i .bak "r $tmpfile" monophones0.txt; rm $tmpfile . tmpfile=$(mktemp); sed -n '/<BEGINHMM>/,/<ENDHMM>/p' proto >$tmpfile; sed -i .bak "r $tmpfile" monophones0.txt; rm $tmpfile This is the basic idea; 这是基本思想; there are other checks you need to perform to make this a robust script. 您还需要执行其他检查才能使此脚本更可靠。 – 4ae1e1 – 4ae1e1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM