简体   繁体   English

用于搜索和替换/插入文件中的文本的Sed / Awk

[英]Sed/Awk to search and replace/insert text in files

I am trying to update or insert few comments like Copyright headers in to all my source files in a directory (Linux). 我正在尝试更新或插入一些注释,如版权标题到目录中的所有源文件(Linux)。 My files are inconsistent, so that a few of them already have headers while others do not have them at all. 我的文件不一致,因此其中一些文件已经有标题,而其他文件根本没有标题。 I tried with sed to look at the first few lines and replace. 我尝试用sed查看前几行并替换。 Replace I mean change the files which are already having Copyright header with latest one. 替换我的意思是更改已经具有最新版权标题的文件。

sed -e '1,10 s/Copyright/*Copyright*/g' file

But, this will not insert if it did not find the pattern. 但是,如果找不到模式,则不会插入。 How can I achieve this? 我怎样才能做到这一点?

Example I provided in comments or what I am trying to actually replace/insert is a multiline typical copyright header as follows 示例我在评论中提供或我试图实际替换/插入的是多行典型版权标题,如下所示

/*
* Copyright 1234 XXXNAME, XYZPlace 
*  text text text text ...........
* blah blah blah */

It may contain some special characters also. 它也可能包含一些特殊字符。

If I understand correctly, you want to: 如果我理解正确,你想:

  • Find files without a Copyright notice in the first 10 lines, and 在前10行中查找没有版权声明的文件,和
  • Add a Copyright notice to those files. 为这些文件添加版权声明。

In addition, you want to: 另外,你想要:

  • Find files WITH a Copyright notice in the first 10 lines, and 在前10行中查找带有版权声明的文件,和
  • Update their notice to your standard text. 更新他们对标准文本的通知。

It seems to me that these two tasks could be boiled down to a single set: 在我看来,这两个任务可以归结为一组:

  • Remove any existing Copyright notice in the first 10 lines, then 删除前10行中的所有现有版权声明
  • Insert a new Copyright notice into the file. 在文件中插入新的版权声明。

If we can safely assume that a shortened version of the sampletext you put in a comment on your question is valid, and should be inserted at, for example, line 2 of each file, then the following should achieve the very first set of requirements if you're using GNU sed: 如果我们可以安全地假设您在问题评论中放置的缩小版本的样本文本是有效的,并且应该插入到每个文件的第2行,那么以下内容应该实现第一组要求你正在使用GNU sed:

find . -type f -not -exec grep -q Copyright {} \; -exec sed -i'' '2i/* Copyright */' {} \;

If you're not running GNU sed (ie you're in FreeBSD or OSX or Solaris, etc), let us know, because the sed script will be different. 如果你没有运行GNU sed(即你在FreeBSD或OSX或Solaris等),请告诉我们,因为sed脚本会有所不同。

How does this work ? 这是如何工作的

The find command is getting the following options: find命令获得以下选项:

  • -type f tells it to look only at files (not directories or devices). -type f告诉它只查看文件(不是目录或设备)。
  • -not inverts the following option. -not反转以下选项。
  • -exec grep -q Copyright {} \\; limits the search to anything with Copyright in it (modified by -not ) 将搜索限制为包含版权的任何内容(由-not修改)
  • -exec sed -i'' '2i/* Copyright */' {} \\; inserts your copyright notice. 插入您的版权声明。

This solution may run into difficulty if you want your copyright notice to include special characters that would be interpreted by the sed script. 如果您希望您的版权声明包含可由sed脚本解释的特殊字符,则此解决方案可能会遇到困难。 But it answers your question. 但它回答了你的问题。 :) :)

If instead, we want to handle the revised requirements, ie remove existing copyright notices first, then we can do this with two one-liners: 相反,如果我们想要处理修订后的要求,即首先删除现有的版权声明,那么我们可以用两个单行来完成:

First, we remove existing copyright notices. 首先,我们删除现有的版权声明。

find . -type f -exec sh -c 'head {} | grep -q Copyright' \; -exec sed -ne '10,$ta;/Copyright/d;:a;p' {} \;

This may be a little redundant, unless you want to traverse subdirectories recursively, which find does by default. 这可能有点多余,除非您想以递归方式遍历子目录,默认情况下会find The sed script does nothing to files that have no Copyright info in the first 10 lines, so the following should also work instead, if all your files are in one directory: sed脚本对前10行中没有版权信息的文件不执行任何操作,因此如果所有文件都在一个目录中,则以下内容也应该起作用:

for file in *;do sed -ne '10,$ta;/Copyright/d;:a;p' "$file"; done

Next, we add new ones back in. 接下来,我们重新添加新的。

for file in *;do sed -i'' '2i/* Copyright */' "$file"; done

Or, if you want to do this recursively through subdirectories: 或者,如果您想通过子目录递归执行此操作:

find . -type f -exec sed -i'' '2i/* Copyright */' {} \;

FINAL UPDATE : 最终更新

I can't spend more time on this one after this. 在此之后,我不能在这个上花更多的时间。

find . -type f \
  -exec sh -c 'head {} | grep -q Copyright' \; \
  -exec sed -ne '1h;1!H;${;g;s:/\*.*Copyright.*\*/:/* Copyright 1998-2012 */' {} \;

What ? 什么

The first -exec searches for the word "Copyright" in the first 10 lines of the file. 第一个-exec在文件的前10行中搜索单词“Copyright”。 Just like the first example I posted, above. 就像上面发布的第一个例子一样。 If grep finds anything, this condition returns true. 如果grep找到任何内容,则此条件返回true。

The second -exec does the substitution. 第二个-exec执行替换。 It reads the entire file into sed's hold buffer. 它将整个文件读入sed的保持缓冲区。 Then when it gets to the end of the file, it ( g ) considers the hold buffer, and ( s ) does a multi-line substitution. 然后当它到达文件的末尾时,它( g )考虑保持缓冲区,并且( s )进行多行替换。

Note that this may very well require some tuning, and it may not work at all if you have comments elsewhere in the file. 请注意,这可能需要进行一些调整,如果您在文件中的其他位置有注释,它可能根本不起作用。 I don't recall whether GNU sed supports non-greedy stars. 我不记得GNU sed是否支持非贪婪的明星。 You can research that yourself. 你可以自己研究一下。

Here's my test: 这是我的测试:

$ printf 'one\n/* Copyright blah blah\n *\n */\ntwo\n' | sed -n '1h;1!H;${;g;s:/\*.*Copyright.*\*/:/* Copyright 1998-2012 */:g;p;}'
one
/* Copyright 1998-2012 */
two

This doesn't maintain your existing Copyright information, but at least it addresses the multi-line issue. 不会保留您现有的版权信息,但至少它可以解决多线问题。

Edit: the command below won't work if you have file names with spaces, see the first comment. 编辑:如果您有包含空格的文件名,则下面的命令将不起作用,请参阅第一条注释。

It can for sure be done with sed only, but the first thing that came to my mind is to do the substitution on files where the line is present and then add the header to the rest of the files using something like 它肯定可以用sed来完成,但我想到的第一件事是对存在行的文件进行替换,然后使用类似的东西将标题添加到其余文件中

for f in $(grep -lv 'Copyright' *); do sed -i '1i *Copyright*' $f; done

That will work for all files in the current folder, use the -r option to grep if you need recursion. 这将适用于当前文件夹中的所有文件,如果需要递归,请使用-r选项grep

PS I suggest removing the -i sed option for testing and adding it only when you're sure the command works right. PS我建议删除-i sed选项进行测试,只有当你确定命令正常时才添加它。

To insert the single line containing the text copyright at line 1 of a file only if it isn't already there, you could do: 要在文件的第1行插入包含文本copyright的单行,只有它不存在,您可以执行以下操作:

sed '1{ /copyright/!i\
copyright
}' input-file

To insert multiple lines: 要插入多行:

sed '1{ /copyright/!i\
copyright\
second line
}' input-file

It's tempting to use r to read the copyright from a file, but I cannot figure out how to insert it before line 1 rather than after line 1. eg: 使用r从文件中读取版权是很诱人的,但我无法弄清楚如何在第1行之前而不是在第1行之后插入它。例如:

sed '1{ /copyright/! { x; r copyright-file
G}}' input-file

Seems like it ought to do the trick, but the text from the copyright-file winds up starting at line 2. 似乎应该这样做,但版权文件中的文字从第2行开始。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM