简体   繁体   中英

how to replace pattern in multi-line in linux

Assume I have a file called text.txt In text.txt, I have a number of the following pattern:

/**
 * @something
**/

I want to replace this pattern to empty string. What is the easiest Linux command to do this?

  1. "grep" does not work because this is multi line pattern.
  2. I tried "sed", but I cannot get it worked.
  3. I guess "awk" may be easy for that, but "awk" seems so complicated and I am not familiar with "awk".

Suppose that our input file is:

$ cat text.txt
before
/**
 * @something
**/
after

We can filter out the comments with awk :

$ awk '/\/\*\*/ {c=1; next} /\*\*\// {c=0; next} c==0 {print}' text.txt
before
after

The awk works by having a variable as a flag called c . When we start, c=0 signaling that we are not in a comment. When the start-of-comment line appears, /** , we set c=1 . c stays at one until the next end-of-comment line, **/ , appears in which case c is set back to 0. The line is only printed out if c=0 . Anything, whatever the format, between the open and close comment lines is not printed.

The code is a funny looking because both / and * are active characters to awk . So, they both need to be escaped with backlashes. Thus, for example, the regular expression to look for the start-of-comment line looks like \\/\\*\\* while the regular expression for end-of-comment looks like \\*\\*\\/ .

More complex input files

Suppose the input file has a more complex structure such as illustrated in JS's example:

$ cat file
something
/**
 * @something
**/ random
hello
hi /**
 * @something
**/ bye
hola
gracias
bye

We can handle this with awk as follows:

$ awk -v RS='\\*\\*/\n*' '{sub(/\n*\/\*\*.*/,"",$0); print $0}' file
something
 random
hello
hi 
 bye
hola
gracias
bye

The above was tested with GNU awk . Since it uses a multi-character record separator, it may not work with older versions of awk .

While awk normally reads a file line by line, in our version above we have set the record separator, RS , to match the end of a comment. Then, we delete everything from the comment start to the end of the record and print the record.

Here is a simple awk to remove the text from, to a given pattern:

cat file
before
/**
 * @something
**/
after

awk '/\*\*\//{f=0} f; /\/\*\*/{f=1}' file
 * @something

When you do not like to include START/END pattern, this is one of the most simple awk to handle this:

awk '/END/{f=0} f; /START/{f=1}'

Using GNU awk for multi-char RS to read the whole file as one string:

If you specifically want to remove just the string you posted, that'd be:

$ cat file
foo/**
 * @something
**/bar and more/**
 * @something
**/stuff

$ awk -v RS='^$' -v ORS= -v pat='/**
 * @something
**/' '{
    while ( s=index($0,pat) ) {
        $0 = substr($0,1,s-1) substr($0,s+length(pat))
    }
    print
}' file
foobar and morestuff

or if you actually just want to remove everything between each occurrence of /** and / all you need is:

awk -v RS='/[*][*][^/]+/' -v ORS= '1' file
foobar and morestuff
cat text.txt | egrep -v "[/]" | egrep -v "[*] @" > newtext.txt

可以做到这一点,但是您可能必须根据文件中的其他内容稍作修改。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM