简体   繁体   中英

Regex find and replace over multi lines in Shell

My problem is similar problem to shell script: search and replace over multiple lines with a small exception.

In the question linked the user wants to do this:

source:
[stuff before]
<!--WIERD_SPECIAL_COMMENT_BEGIN-->
  [stuff here, possibly multiple lines.
<!--WIERD_SPECIAL_COMMENT_END-->
[stuff after]    

target:
[stuff before]
[new content]
[stuff after]

My problem is similar, I want to do this:

source:
[stuff before]
<!--WIERD_SPECIAL_COMMENT_BEGIN-->
  [this]
<!--WIERD_SPECIAL_COMMENT_END-->
<!--WIERD_SPECIAL_COMMENT_BEGIN-->
  [not this]
<!--WIERD_SPECIAL_COMMENT_END-->
[stuff after]    

target:
[stuff before]
[new content]
<!--WIERD_SPECIAL_COMMENT_BEGIN-->
  [not this]
<!--WIERD_SPECIAL_COMMENT_END-->
[stuff after]

In a proper multiline regex this is easy to do:

/<!--WIERD_SPECIAL_COMMENT_BEGIN-->.*[this].*<!--WIERD_SPECIAL_COMMENT_END-->/m

but the answer suggested in the linked question uses regex as ranges which doesn't allow checking lines between the two outlying bounds.

Is there any way to add all the lines in a range to the pattern buffer so I can regex over all the lines at once? eg:

sed '
    #range between comment beginning and comment end
    /<!--WIERD_SPECIAL_COMMENT_BEGIN-->/,/<!--WIERD_SPECIAL_COMMENT_END-->/
    #Do something to add the lines in this range to pattern buffer
    /.*[this].*/d
    #Delete all the lines if [this] is in the pattern buffer
' <in.txt >out.txt

With Perl, it's relatively simple.

perl -0777pe 's/<!--BEGIN-->\n(?:(?!<!--END-->\n).)*?\[this\].*?\n<!--END-->\n/[new content]\n/s' in.txt

The benefits offered by Perl are (a) the -0777 "slurp mode" which pulls in the entire input file in one go, instead of sed 's line-at-a-time processing; (b) the /s regex flag which allows for dot to match a newline; (c) the stingy repetition operators *? and friends, which causes the repetition to match as little as possible instead of as much as possible; and finally (d) the negative lookahead (?!...) which allows you to inhibit matching where the negative lookahead expression matches. (Without this, even stingy matching would match across an end delimiter if there was a "false" starting delimiter in the "stuff before" text.) ... And of course, (e) a general-purpose programming language where sed is only suitable for relatively simple text processing tasks.

(I used simpler beginning and ending delimiters. I hope "wierd" was an intentional misspelling.)

I am a beginner.我是初学者。 This surely is not be the best way to do it.


I've done something similar in three steps. Assuming you're running on Linux, you can do the following:

1) Replace all occurences of a newline in your file with a special character:

cat originalText.txt | tr '\n' '~' > temp

2) Perform your regex using your favorite method (I used perl) placing an instance of the special character at each position you expect a newline. Make sure to keep the special newline character intact.

3) Do the first command the other way around this time:

cat temp | tr '~' '\n' > modText.txt

I hope this helps.

Sure, use the hold space. For example:

sed -n '/begin/,/end/{ /begin/{h;d};H}; /end/{g;s/\n/<newline>/gp}'

will replace newlines between lines matching 'begin' and 'end' with the text <newline>

这可能对您有用(GNU sed):

sed ':a;$!N;/^<!--WIERD_SPECIAL_COMMENT_BEGIN-->/!{P;D};/<!--WIERD_SPECIAL_COMMENT_END-->$/!ba;s/\[this\]/[new content]/;p;d' file

You can do it like this with sed :

parse.sed

/BEGIN/ {               # If we encounter BEGIN
  :a                    # Read all until END
  N                     # into pattern space
  /END/!ba              # /
  /\[this\]/d           # If the block contains [this], delete it
  s/^/[new content]\n/  # Insert [new content] before the block
}

Run it like this:

sed -f parse.sed infile

Output:

[stuff before]
[new content]
<!--WIERD_SPECIAL_COMMENT_BEGIN-->
  [not this]
<!--WIERD_SPECIAL_COMMENT_END-->
[stuff after]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM