简体   繁体   English

修改文本文件

[英]modify text file

I need to modify all files that has a ".txt" extension within a directory in the following way: 我需要通过以下方式修改目录中具有“.txt”扩展名的所有文件:

remove all text lines beginning with the line that starts with "xxx" and the line that ends with "xxx", inclusive. 删除以“xxx”开头的行开头的所有文本行和以“xxx”结尾的行。

I know how to do this in Java or C++, but can someone show me a simple script that can get this done? 我知道如何在Java或C ++中执行此操作,但是有人可以向我展示一个可以完成此操作的简单脚本吗?

Thanks! 谢谢!

I assume that you want to lose start and end, and those words appear by themselves on the lines you want lost. 我假设你想要失去开始和结束,那些单词会出现在你想要丢失的行上。

perl -ni.bak -e 'print unless /^start$/../^end$/' *.txt

Note that I made a backup of the modified files so that you can inspect the change and fix it if you want. 请注意,我对修改后的文件进行了备份,以便您可以检查更改并根据需要进行修复。

Not that there's anything wrong with @btilly's answer — in fact, I would do it his way myself — but just to show you that There's More Than One Way To Do It , you could also use a substitution: 并不是说@ btilly的答案有任何问题 - 事实上,我会以自己的方式做到这一点 - 但只是为了向您展示不止一种方法 ,您还可以使用替换:

% perl -i.save -0777 -pe 's/^start.*end$//gsm' *.txt

That will leave you an extra newline sequence at the end, but it works if the end is at EOF and there's no newline. 这将在最后留下一个额外的换行序列,但如果结尾处于EOF并且没有新行,则它可以工作。 You could also take that into account this way: 你也可以这样考虑:

% perl -i.save -0777 -pe 's/^start.*end$\R?//gsm' *.txt

You said a line that starts with "xxx" but you didn't say that was all that was on the line, and you said the line that ends with "xxx", but you didn't say that was all that was on its line either. 你说一行以“xxx”开头,但是你没有说那是在线上的所有内容,而你说的是以“xxx”结尾的那一行,但是你没有说那就是它的全部内容也行。 And you didn't mention what happens if those are the same line. 你没有提到如果那些是同一条线会发生什么。 I believe you'll find that my solution handles those cases. 我相信你会发现我的解决方案处理这些案件。

It doesn't, however, handle the case of the start and the end strings overlapping. 但是,它不处理开始和结束字符串重叠的情况。 If you really want that, too, tell me and I'll fiddle with it so it works. 如果你真的想要那个,请告诉我,我会把它搞砸,这样才有效。

Another nice thing about using Perl for this is that it very easily works with UTF-8 datafiles, too: 使用Perl的另一个好处是它也非常容易使用UTF-8数据文件:

bash-3.2$ cat /tmp/data
     1  fee 
     2  commencé
     3  fie foo
     4  fum
     5  terminé
     6  beat on 
     7  the drum

bash-3.2$ perl -Mutf8 -CSD -nle 'print unless /commencé/ .. /terminé/' /tmp/data
     1  fee 
     6  beat on 
     7  the drum

bash-3.2$ perl -i.guardé -Mutf8 -CSD -nle 'print unless /commencé/ .. /terminé/' /tmp/data

bash-3.2$ cat /tmp/data
     1  fee 
     6  beat on 
     7  the drum

bash-3.2$ cat /tmp/data.guardé 
     1  fee 
     2  commencé
     3  fie foo
     4  fum
     5  terminé
     6  beat on 
     7  the drum

Et voilà! Etvoilà! :) :)

This is one of those problem domains where Perl especially lends itself to extremely short, simple, readable, and maintainable answers. 这是其中一个问题领域,其中Perl特别适用于短,简单,可读和可维护的答案。 It really is the ultimate Unix Power Tool. 它确实是最终的Unix Power Tool。

Obviously you'll never approach this sort of power-tool operation from Java or C++. 显然,你永远不会从Java或C ++接近这种电动工具操作。 Ruby, I suspect, might be able to do something similar, but I think Python is too far from the Unix style to provide as succinct and simple an answer. 我怀疑Ruby可能会做类似的事情,但我认为Python离Unix风格太远,无法提供简洁明了的答案。

Plus it runs quite quickly, too: not quite as fast as C, but certainly much, much faster than some ponderously slow shell script. 再加上它运行得相当快,太:没有那么快为C,但肯定不多,比一些笨重缓慢的shell脚本快得多。 Well, at least if you do the linewise processing, that is. 好吧,至少如果你进行行处理,那就是。 Reading everything into memory is never going to scale, but it's ok for little things. 将所有内容都读入内存永远不会扩展,但它可以用于小事情。 Also, shell tools tend to bomb out on files with binary data in them, or very long lines, so you can't always rely on them for such things, especially in a portable, cross-platform fashion. 此外,shell工具倾向于轰炸其中包含二进制数据或非常长的行的文件,因此您不能总是依赖它们来做这些事情,尤其是以便携式,跨平台的方式。 And almost none of them work reliably with Unicode, which is a real must these days. 而且几乎所有这些都不能与Unicode可靠地协同工作,而现在这是必须的。

ruby -i.bak -ne 'print unless /^start/.../^end/' *.txt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM