sed replace text matching complicated regex pattern

Question

I am in the process of porting an existing database schema to Postgresql.

I need to replace occurrences of the word 'go' with a semi comma.

I have noticed that the word 'go' appears in the text, in the following pattern:

[non empty string (SQL)]
[followed by one or more new lines]
[followed by one or more white space]
[followed by the word 'go']
[Followed by one or more new lines]

I want to replace the above pattern with the following one:

[non empty string (SQL)]
[followed by ';']
[Followed by TWO new lines]

I am trying to build a regex expression which I can use with sed, to perform the replacement described above - but I am relatively new to regex.

For the purpose of clarity, I have included sample text BEFORE and AFTER the substitution I want to achieve:

-- Original File contents below -------



go
CREATE TABLE foobar
(
    f1    INT,
    f2    INT,
    f3    FLOAT,
        f4    VARCHAR(32) NOT NULL,
    f5    INT,
    f6    datetime,
        f7    smallint
)


go

GRANT UPDATE, INSERT, DELETE, SELECT ON foobar TO dbusr
go
CREATE UNIQUE INDEX idxu_foobar ON foobar (f1, f2)

go


--- REPLACED FILE CONTENTS -----------



go
CREATE TABLE foobar
(
    f1    INT,
    f2    INT,
    f3    FLOAT,
        f4    VARCHAR(32) NOT NULL,
    f5    INT,
    f6    datetime,
        f7    smallint
);

GRANT UPDATE, INSERT, DELETE, SELECT ON foobar TO dbusr;
CREATE UNIQUE INDEX idxu_foobar ON foobar (f1, f2);

Can anyone help with the expression to use to achieve this, so I can execute: sed -i 's/original_match_expr/replacement_expr/g' myfile.sql

Answer 1

Try following solution using the GNU version of sed :

sed -ne ':a; $! { N; ba }; s/\([^[:space:]]\)[[:space:]]*go/\1;/g; p' infile

It reads the whole file to a buffer and replace all go words and all blanks that precede it with a semicolon. It yields:

go
CREATE TABLE foobar
(
    f1    INT,
    f2    INT,
    f3    FLOAT,
        f4    VARCHAR(32) NOT NULL,
    f5    INT,
    f6    datetime,
        f7    smallint
);

GRANT UPDATE, INSERT, DELETE, SELECT ON foobar TO dbusr;
CREATE UNIQUE INDEX idxu_foobar ON foobar (f1, f2);

EDIT to add an explanation (see comments):

It's not as hard as it seems.

:a; $! { N; ba } :a; $! { N; ba } is a loop that reads every line of input to a buffer.

[[:space:]] matches any whitespace character and [^[:space:]] negates it. So the substitution command replaces from last non-whitespace character until the word go . If there is only whitespace before the go word as in the first case, the substitution doesn't match and does replace nothing.

Answer 2

With gawk

awk -v RS='\\s*go' '{print $0""(RT ~ /go/? ";\n\n": "")}' file.txt

The record separator RS is set to 0 or more space characters followed by go . GNU awk then treats the block of text between two successive instances of record separators as a record. So print the record followed by a custom record separator ( ; followed by two newlines)

sed replace text matching complicated regex pattern

Question

2 answers

solution1
1 2013-11-02 20:18:42

solution2
1 ACCPTED 2013-11-02 20:21:46

sed replace text matching complicated regex pattern

Question

2 answers

solution1 1 2013-11-02 20:18:42

solution2 1 ACCPTED 2013-11-02 20:21:46

solution1
1 2013-11-02 20:18:42

solution2
1 ACCPTED 2013-11-02 20:21:46