[英]Delete all lines between two patterns (exclusive of the pattern) using sed or awk
I have a somewhat large output text file where I need to delete all lines between two patterns but retain the pattern match. 我有一个有点大的输出文本文件,我需要删除两个模式之间的所有行,但保持模式匹配。
The files look vaguely like the following output. 这些文件看起来像以下输出一样模糊。
TEST #1
coef1 | 48.36895 3.32013 14.57 0.000 41.86141 54.87649
coef2 | -50.08894 10.47335 -4.78 0.000 -70.61697 -29.56092
|
indicator |
0 | .6647992 2.646627 0.25 0.802 -4.55925 5.888849
1 | 2.118701 5.225777 0.41 0.686 -8.19621 12.43361
|
year |
2 | -.4324005 2.231387 -0.19 0.847 -4.836829 3.972028
3 | -.362762 1.97184 -0.18 0.854 -4.254882 3.529358
|
_cons | 16.95753 6.342342 2.67 0.008 4.526383 29.38869
TEST #2
coef2 | 48.36895 3.32013 14.57 0.000 41.86141 54.87649
coef3 | -50.08894 10.47335 -4.78 0.000 -70.61697 -29.56092
|
year |
4 | .6647992 2.646627 0.25 0.802 -4.55925 5.888849
5 | 2.118701 5.225777 0.41 0.686 -8.19621 12.43361
|
idnumber |
6 | -.4324005 2.231387 -0.19 0.847 -4.836829 3.972028
7 | -.362762 1.97184 -0.18 0.854 -4.254882 3.529358
|
_cons | 16.95753 6.342342 2.67 0.008 4.526383 29.38869
I need to take the following output and delete all the lines between "year" and "_cons" but I need to retain the line starting with "_cons". 我需要采取以下输出并删除“year”和“_cons”之间的所有行,但我需要保留以“_cons”开头的行。 The desired output is like so: 所需的输出如下:
TEST #1
coef1 | 48.36895 3.32013 14.57 0.000 41.86141 54.87649
coef2 | -50.08894 10.47335 -4.78 0.000 -70.61697 -29.56092
|
indicator |
0 | .6647992 2.646627 0.25 0.802 -4.55925 5.888849
1 | 2.118701 5.225777 0.41 0.686 -8.19621 12.43361
|
year |
_cons | 16.95753 6.342342 2.67 0.008 4.526383 29.38869
TEST #2
coef2 | 48.36895 3.32013 14.57 0.000 41.86141 54.87649
coef3 | -50.08894 10.47335 -4.78 0.000 -70.61697 -29.56092
|
year |
_cons | 16.95753 6.342342 2.67 0.008 4.526383 29.38869
I wrote the following script (under OS X): 我编写了以下脚本(在OS X下):
sed '/^ +year/,/^ +_cons/{/^ +year/!{/^ +_cons/!d}}' input.txt >output.txt
but I got the following error: 但我收到以下错误:
sed: 1: "/^ +year/,/^ +_cons/{/^ ...": extra characters at the end of d command
I'm not sure if this approach is even correct because I can't seem to get sed to execute. 我不确定这种方法是否正确,因为我似乎无法执行。 Is sed even appropriate here or should I use awk? 在这里甚至是合适的还是我应该使用awk?
One last note, I need this script to work on a relatively generic Unix install. 最后一点,我需要这个脚本来处理相对通用的Unix安装。 I have to send this to someone who must execute it under a very basic AIX (I think) install. 我必须将此发送给必须在非常基本的AIX(我认为)安装下执行它的人。 No perl, no python, and I can't do much troubleshooting on their install over email. 没有perl,没有python,我无法通过电子邮件对其安装进行太多故障排除。
This should work - 这应该工作 -
awk '/year/{print; getline; while($0!~/_cons/) {getline}}1' INPUT_FILE
or 要么
awk '/_cons/{print;f=0;next}/year/{f=1;print;next}f{next}1' INPUT_FILE
[jaypal:~/Temp] awk '/year/{print; getline; while($0!~/_cons/) {getline}}1' file
TEST #1
coef1 | 48.36895 3.32013 14.57 0.000 41.86141 54.87649
coef2 | -50.08894 10.47335 -4.78 0.000 -70.61697 -29.56092
|
indicator |
0 | .6647992 2.646627 0.25 0.802 -4.55925 5.888849
1 | 2.118701 5.225777 0.41 0.686 -8.19621 12.43361
|
year |
_cons | 16.95753 6.342342 2.67 0.008 4.526383 29.38869
TEST #2
coef2 | 48.36895 3.32013 14.57 0.000 41.86141 54.87649
coef3 | -50.08894 10.47335 -4.78 0.000 -70.61697 -29.56092
|
year |
_cons | 16.95753 6.342342 2.67 0.008 4.526383 29.38869
[jaypal:~/Temp] awk '/_cons/{print;f=0;next}/year/{f=1;print;next}f{next}1' file
TEST #1
coef1 | 48.36895 3.32013 14.57 0.000 41.86141 54.87649
coef2 | -50.08894 10.47335 -4.78 0.000 -70.61697 -29.56092
|
indicator |
0 | .6647992 2.646627 0.25 0.802 -4.55925 5.888849
1 | 2.118701 5.225777 0.41 0.686 -8.19621 12.43361
|
year |
_cons | 16.95753 6.342342 2.67 0.008 4.526383 29.38869
TEST #2
coef2 | 48.36895 3.32013 14.57 0.000 41.86141 54.87649
coef3 | -50.08894 10.47335 -4.78 0.000 -70.61697 -29.56092
|
year |
_cons | 16.95753 6.342342 2.67 0.008 4.526383 29.38869
Try adding a semicolon after d
to indicate that the command has ended. 尝试在d
后添加分号以指示命令已结束。 (GNU sed
— the only sed
I have handy to test with — doesn't require this, but maybe another sed
would?) (GNU sed
- 我唯一能用来测试的sed
- 不需要这个,但也许是另一个sed
会吗?)
Also, if you need to support multiple implementations of sed
, then you can't use +
to mean "one or more": it's not standard, and not all implementations support it. 此外,如果您需要支持sed
多个实现,那么您不能使用+
来表示“一个或多个”:它不是标准的,并非所有实现都支持它。 You can use \\{1,\\}
, but that's pretty ugly . 您可以使用\\{1,\\}
,但这非常难看。 . 。 . 。 I'd just use *
and tack on an extra copy. 我只是使用*
并添加额外的副本。
So: 所以:
sed '/^ * year/,/^ * _cons/{/^ * year/!{/^ * _cons/!d;}}' input.txt >output.txt
(Tested, but only using GNU sed
, not OS X, and certainly not AIX, sorry.) (经过测试,但只使用GNU sed
,而不是OS X,当然不是AIX,抱歉。)
This might work for you: 这可能对你有用:
sed '/year/,/_cons/{//!d}' file
or: 要么:
awk '/_cons/{p=0};!p;/year/{p=1}' file
You can do it visually. 你可以直观地做。 Just open the file with gVim
, and run the command: 只需用gVim
打开文件,然后运行命令:
:g/^\s*year/+1,/^\s*_cons/-1 d
g
global command g
全局命令 /^\\s*year/+1
match line bellow year
/^\\s*year/+1
匹配线以下year
/^\\s*_cons/-1
match line above _cons
/^\\s*_cons/-1
匹配_cons
之上的_cons
d
delete the range d
删除范围 To summarize and generalize the two GNU sed solutions that work: 总结和概括两个有效的GNU sed解决方案:
sed '/BEGIN/,/END/{/BEGIN/!{/END/!d;}}' input.txt
sed '/BEGIN/,/END/{//!d}' input.txt
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.