[英]Extract lines between 2 tokens in a text file using bash
i have a text file which looks like this:我有一个看起来像这样的文本文件:
random useless text
<!-- this is token 1 -->
para1
para2
para3
<!-- this is token 2 -->
random useless text again
I want to extract the text in between the tokens (excluding the tokens of course).我想提取标记之间的文本(当然不包括标记)。 I tried using ## and %% to extract the data in between but it didn't work.
我尝试使用 ## 和 %% 来提取两者之间的数据,但没有用。 I think it is not meant for manipulating such large text files.
我认为它不是用来操作这么大的文本文件的。 Any suggestions how i can do it ?
任何建议我该怎么做? maybe awk or sed ?
也许 awk 或 sed ?
No need for head
and tail
or grep
or to read the file multiple times:无需
head
和tail
或grep
或多次读取文件:
sed -n '/<!-- this is token 1 -->/{:a;n;/<!-- this is token 2 -->/b;p;ba}' inputfile
Explanation:说明:
-n
- don't do an implicit print -n
- 不要进行隐式打印/<!-- this is token 1 -->/{
- if the starting marker is found, then /<!-- this is token 1 -->/{
- 如果找到起始标记,则
:a
- label "a" :a
- 标签“a”
n
- read the next line n
- 阅读下一行/<!-- this is token 2 -->/q
- if it's the ending marker, quit /<!-- this is token 2 -->/q
- 如果它是结束标记,则退出p
- otherwise, print the line p
- 否则,打印该行ba
- branch to label "a" ba
- 标记“a”的分支}
end if }
如果最终You can extract it, including the tokens with sed.您可以提取它,包括带有 sed 的标记。 Then use head and tail to strip the tokens off.
然后使用头部和尾部来剥离令牌。
... | sed -n "/this is token 1/,/this is token 2/p" | head -n-1 | tail -n+2
no need to call mighty sed / awk / perl.无需调用强大的 sed/awk/perl。 You could do it "bash-only":
你可以做到“仅限 bash”:
#!/bin/bash
STARTFLAG="false"
while read LINE; do
if [ "$STARTFLAG" == "true" ]; then
if [ "$LINE" == '<!-- this is token 2 -->' ];then
exit
else
echo "$LINE"
fi
elif [ "$LINE" == '<!-- this is token 1 -->' ]; then
STARTFLAG="true"
continue
fi
done < t.txt
Kind regards亲切的问候
realex Realex
Try the following:请尝试以下操作:
sed -n '/<!-- this is token 1 -->/,/<!-- this is token 2 -->/p' your_input_file
| egrep -v '<!-- this is token . -->'
Maybe sed and awk have more elegant solutions, but I have a "poor man's" approach with grep, cut, head, and tail.也许 sed 和 awk 有更优雅的解决方案,但我对 grep、cut、head 和 tail 有一个“穷人”的方法。
#!/bin/bash
dataFile="/path/to/some/data.txt"
startToken="token 1"
stopToken="token 2"
startTokenLine=$( grep -n "${startToken}" "${dataFile}" | cut -f 1 -d':' )
stopTokenLine=$( grep -n "${stopToken}" "${dataFile}" | cut -f 1 -d':' )
let stopTokenLine=stopTokenLine-1
let tailLines=stopTokenLine-startTokenLine
head -n ${stopTokenLine} ${dataFile} | tail -n ${tailLines}
For anything like this, I'd reach for Perl , with its combination of (amongst others) sed
and awk
capabilities.对于这样的事情,我会使用Perl ,它结合了(除其他外)
sed
和awk
功能。 Something like (beware - untested):类似的东西(当心 - 未经测试):
my $recording = 0;
my @results = ();
while (<STDIN>) {
chomp;
if (/token 1/) {
$recording = 1;
}
else if (/token 2/) {
$recording = 0;
}
else if ($recording) {
push @results, $_;
}
}
sed -n "/TOKEN1/,/TOKEN2/p" <YOUR INPUT FILE> | sed -e '/TOKEN1/d' -e '/TOKEN2/d'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.