使用bash提取文本文件中2个标记之间的行

Question

i have a text file which looks like this:我有一个看起来像这样的文本文件：

random useless text 
<!-- this is token 1 --> 
para1 
para2 
para3 
<!-- this is token 2 --> 
random useless text again

I want to extract the text in between the tokens (excluding the tokens of course).我想提取标记之间的文本（当然不包括标记）。 I tried using ## and %% to extract the data in between but it didn't work.我尝试使用 ## 和 %% 来提取两者之间的数据，但没有用。 I think it is not meant for manipulating such large text files.我认为它不是用来操作这么大的文本文件的。 Any suggestions how i can do it ?任何建议我该怎么做？ maybe awk or sed ?也许 awk 或 sed ？

Answer 1

No need for head and tail or grep or to read the file multiple times:无需head和tail或grep或多次读取文件：

sed -n '/<!-- this is token 1 -->/{:a;n;/<!-- this is token 2 -->/b;p;ba}' inputfile

Explanation:说明：

-n - don't do an implicit print -n - 不要进行隐式打印
//{ - if the starting marker is found, then //{ - 如果找到起始标记，则
- :a - label "a" :a - 标签“a”
  - n - read the next line n - 阅读下一行
  - //q - if it's the ending marker, quit //q - 如果它是结束标记，则退出
  - p - otherwise, print the line p - 否则，打印该行
- ba - branch to label "a" ba - 标记“a”的分支
} end if }如果最终

Answer 2

You can extract it, including the tokens with sed.您可以提取它，包括带有 sed 的标记。 Then use head and tail to strip the tokens off.然后使用头部和尾部来剥离令牌。

... | sed -n "/this is token 1/,/this is token 2/p" | head -n-1 | tail -n+2

Answer 3

no need to call mighty sed / awk / perl.无需调用强大的 sed/awk/perl。 You could do it "bash-only":你可以做到“仅限 bash”：

#!/bin/bash
STARTFLAG="false"
while read LINE; do
    if [ "$STARTFLAG" == "true" ]; then
            if [ "$LINE" == '<!-- this is token 2 -->' ];then
                    exit
            else
                    echo "$LINE"
            fi
    elif [ "$LINE" == '<!-- this is token 1 -->' ]; then
            STARTFLAG="true"
            continue
    fi
done < t.txt

Kind regards亲切的问候

realex Realex

Answer 4

Try the following:请尝试以下操作：

sed -n '/<!-- this is token 1 -->/,/<!-- this is token 2 -->/p' your_input_file
        | egrep -v '<!-- this is token . -->'

Answer 5

Maybe sed and awk have more elegant solutions, but I have a "poor man's" approach with grep, cut, head, and tail.也许 sed 和 awk 有更优雅的解决方案，但我对 grep、cut、head 和 tail 有一个“穷人”的方法。

#!/bin/bash

dataFile="/path/to/some/data.txt"
startToken="token 1"
stopToken="token 2"

startTokenLine=$( grep -n "${startToken}" "${dataFile}" | cut -f 1 -d':' )
stopTokenLine=$( grep -n "${stopToken}" "${dataFile}" | cut -f 1 -d':' )

let stopTokenLine=stopTokenLine-1
let tailLines=stopTokenLine-startTokenLine

head -n ${stopTokenLine} ${dataFile} | tail -n ${tailLines}

Answer 6

For anything like this, I'd reach for Perl , with its combination of (amongst others) sed and awk capabilities.对于这样的事情，我会使用Perl ，它结合了（除其他外） sed和awk功能。 Something like (beware - untested):类似的东西（当心 - 未经测试）：

my $recording = 0;
my @results = ();
while (<STDIN>) {
   chomp;
   if (/token 1/) {
      $recording = 1;
   }
   else if (/token 2/) {
      $recording = 0;
   }
   else if ($recording) {
      push @results, $_;
   }
}

Answer 7

sed -n "/TOKEN1/,/TOKEN2/p" <YOUR INPUT FILE> | sed -e '/TOKEN1/d' -e '/TOKEN2/d'

使用bash提取文本文件中2个标记之间的行

问题描述

7 个解决方案

解决方案1
41 2011-02-01 01:28:53

解决方案2
26 已采纳 2011-01-31 23:49:27

解决方案3
1 2017-02-20 16:44:43

解决方案4
1 2011-01-31 23:47:56

解决方案5
1 2011-01-31 23:58:06

解决方案6
0 2011-01-31 23:46:47

解决方案7
0 2020-11-12 16:40:42

使用bash提取文本文件中2个标记之间的行

问题描述

7 个解决方案

解决方案1 41 2011-02-01 01:28:53

解决方案2 26 已采纳 2011-01-31 23:49:27

解决方案3 1 2017-02-20 16:44:43

解决方案4 1 2011-01-31 23:47:56

解决方案5 1 2011-01-31 23:58:06

解决方案6 0 2011-01-31 23:46:47

解决方案7 0 2020-11-12 16:40:42

解决方案1
41 2011-02-01 01:28:53

解决方案2
26 已采纳 2011-01-31 23:49:27

解决方案3
1 2017-02-20 16:44:43

解决方案4
1 2011-01-31 23:47:56

解决方案5
1 2011-01-31 23:58:06

解决方案6
0 2011-01-31 23:46:47

解决方案7
0 2020-11-12 16:40:42