简体   繁体   中英

How do i select multiple lines between markers (*) excluding the last one (using sed)? And how do I select all the rest?

I have a giant .txt file formatted as following (each non-blank line starts with triple whitespace):

   unwanted text
   unwanted text

   *wanted text
   abc
   def

   *wanted text 2
   content
   content

   *wanted text 3
   content
   content

   (...)

I'm looking for a code that returns me only the lines from the first " *" ocurrence until (but excluding) the second " *" ocurrence.

Surfing through multiple StackOverflow posts, i've managed to get the following working code, using Ubuntu (GNU/Linux):

sed -n -e '/^   \*/{p;q}' bigfile.txt && sed -e '1,/   \*/d' -e '/   \*/,$d' bigfile.txt

It gives me the following (as wanted) output:

*wanted text
abc
def
\n (representing a wanted blank line)

Though it's exactly the output I want, you have to agree with me, it's a kinda dumb code, since i have to use sed twice. First I had only the 2nd part of it (after "&&") and would return the right thing except for the first line (*wanted text). I've then appended this first part of code (before "&&") so I get also the first line of the wanted part. Every other piece of code I've tried didn't get me any better result.

It's never enough to say, it's a very big file, and I'll be doing this recursively in a script so, if possible, a /q (quitting after find the first result) is preferable.

After this is done, i need something that would take the result of the last command as the input, so i can get the exactly the whole text EXCEPT the prior result, like such:

   unwanted text
   unwanted text

   *wanted text 2
   content
   content

   *wanted text 3
   content
   content

   (...)

So, in summary, my 2 questions are:

  • Is there a way to get the 1st desired output as described above with a sed one-liner, without calling sed twice (and preferably quitting after finding the excerpt so it won't search through all the big file)? I'm pretty sure there's a more elegant solution.
  • How can i get as an output 'the whole text except for the result of the prior question' (like the 'reverse' output?)? I have no software requisites, I just need it so i can run the prior action again and again on and "ever-updating" input and handle each output of the 1st command according to specific conditions.

Hope i'm clear enough. Please ask me if any detail is missing. Thank you very much for your attention!

awk to the rescue!

$ awk '$1~/^*/{if(f) exit; f=1} f' file

   *wanted text
   abc
   def
   <-- here is the empty line formatter eats

for the second part

$ awk '$1~/^*/{f++} !f||f>1' file

   unwanted text
   unwanted text

   *wanted text 2
   content
   content

   *wanted text 3
   content
   content

   (...)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM