modify XML tags with specific pattern by Regex tools

Question

I have a large xml file with a bunch of database table definitions that look like this:

table name="dbname.tablename" lots of text here>

I would like to replace the end bracket in each matching line (not all lines start with table name="" ) so that the original line is retained, but slonyId="number" is appended before the > . To make things a bit more complex, I'd like the slonyId number to be incremented, starting at 0, so that if I have 1000 table definitions, the first one looks like:

table name="dbname.tablename" lots of text here slonyid="0">

And the last one looks like:

table name="dbname.tablename" lots of text here slonyId="999">

What is the best approach to this problem?

Thanks in advance!

Answer 1

Adding solution from JS:

awk -F'>' '/table name/{$NF="slonyid="q x++ q FS}1' q='"' inputFile

Try this:

awk -F'>' '/table name/{print $(NF-1)" slonyid""=""\""NR-1"\""">"}' inputFile

Adding test:

$ cat temp.txt
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>
table name="dbname.tablename" lots of text here>


$ awk -F'>' '/table name/{print $(NF-1)" slonyid""=""\""NR-1"\""">"}' temp.txt
table name="dbname.tablename" lots of text here slonyid="0">
table name="dbname.tablename" lots of text here slonyid="1">
table name="dbname.tablename" lots of text here slonyid="2">
table name="dbname.tablename" lots of text here slonyid="3">
table name="dbname.tablename" lots of text here slonyid="4">
table name="dbname.tablename" lots of text here slonyid="5">
table name="dbname.tablename" lots of text here slonyid="6">
table name="dbname.tablename" lots of text here slonyid="7">
table name="dbname.tablename" lots of text here slonyid="8">
table name="dbname.tablename" lots of text here slonyid="9">
table name="dbname.tablename" lots of text here slonyid="10">
table name="dbname.tablename" lots of text here slonyid="11">
table name="dbname.tablename" lots of text here slonyid="12">
table name="dbname.tablename" lots of text here slonyid="13">
table name="dbname.tablename" lots of text here slonyid="14">

Answer 2

Code for GNU sed :

sed = file|sed 'N;s/\n/\t/;/\S\+\s\+table name/!d'|sed =|sed 'N;s/\n/\t/;s/\(\S\+\)\s\+\([^>]\+\)>/\2 slonyid="\1">/;s#\(\S\+\)\s\+\(.*\)#\1 s/.*/\2/#'|sed -f - file

Pure sed solution with 4 pipes.

$cat file
table name="dbname.tablename" lots of text AAA here>
index name="dbname.tablename" lots of text XXX here>
index name="dbname.tablename" lots of text YYY here>
index name="dbname.tablename" lots of text ZZZ here>
table name="dbname.tablename" lots of text BBB here>
index name="dbname.tablename" lots of text XXX here>
index name="dbname.tablename" lots of text YYY here>
table name="dbname.tablename" lots of text CCC here>
index name="dbname.tablename" lots of text XXX here>
table name="dbname.tablename" lots of text DDD here>
index name="dbname.tablename" lots of text XXX here>
index name="dbname.tablename" lots of text YYY here>
index name="dbname.tablename" lots of text ZZZ here>
table name="dbname.tablename" lots of text EEE here>
index name="dbname.tablename" lots of text XXX here>
index name="dbname.tablename" lots of text YYY here>
table name="dbname.tablename" lots of text FFF here>
index name="dbname.tablename" lots of text XXX here>
index name="dbname.tablename" lots of text YYY here>
index name="dbname.tablename" lots of text ZZZ here>

$sed = file|sed 'N;s/\n/\t/;/\S\+\s\+table name/!d'|sed =|sed 'N;s/\n/\t/;s/\(\S\+\)\s\+\([^>]\+\)>/\2 slonyid="\1">/;s#\(\S\+\)\s\+\(.*\)#\1 s/.*/\2/#'|sed -f - file
table name="dbname.tablename" lots of text AAA here slonyid="1">
index name="dbname.tablename" lots of text XXX here>
index name="dbname.tablename" lots of text YYY here>
index name="dbname.tablename" lots of text ZZZ here>
table name="dbname.tablename" lots of text BBB here slonyid="2">
index name="dbname.tablename" lots of text XXX here>
index name="dbname.tablename" lots of text YYY here>
table name="dbname.tablename" lots of text CCC here slonyid="3">
index name="dbname.tablename" lots of text XXX here>
table name="dbname.tablename" lots of text DDD here slonyid="4">
index name="dbname.tablename" lots of text XXX here>
index name="dbname.tablename" lots of text YYY here>
index name="dbname.tablename" lots of text ZZZ here>
table name="dbname.tablename" lots of text EEE here slonyid="5">
index name="dbname.tablename" lots of text XXX here>
index name="dbname.tablename" lots of text YYY here>
table name="dbname.tablename" lots of text FFF here slonyid="6">
index name="dbname.tablename" lots of text XXX here>
index name="dbname.tablename" lots of text YYY here>
index name="dbname.tablename" lots of text ZZZ here>

Answer 3

This perl one-liner will do the trick if I understand your question correctly:

perl -pi.bak -e 'BEGIN {$count=0}; if (/^table name=/) { s/^(table name=.*)>$/$1 slonyId="$count">/; $count++}' inputFile.xml

These options tell perl to loop over the given filenames and creates a backup with the name "orig_filname.bak":

perl -pi.bak -e

This initializes the $count variable:

BEGIN {$count=0};

This increments count and does the replacement you asked for:

if (/^table name=/) { s/^(table name=.*)>$/$1 slonyId="$count">/; $count++}

Then just provide the list of filenames at the end:

inputFile.xml

This is not a very robust solution and could break if any lines in your file don't match the description you gave above, but it should work for your problem.

I think I'm too new to comment on the other solutions directly, but in my tests FDinoff's solution will add the slonyId to a line that looks like this:

not a table name="dbname.tablename" lots of text here>

And Amit's solution will add the slonyId to every line, not just lines that begin with "table name".

Answer 4

vim solution

Use global to find table name= in a line. and replace the > on that line with slonyId="number"> You can do this by using using the following two line.

:let i = 0
:g/^table name=/s/>/\='slonyId="' . i . '"' . submatch(0)/ | let i=i+1

The first line initializes i to 0. The substitute takes the first element of that list every time it does a match and uses string concatenation to generate the correct string. Then after the substitute i will be incremented. So that the next substitute gets the next number in the sequence.

Answer 5

You should never edit XML files using line-by-line string manipulations. XML isn't structured like that. Always use a proper XML parser, like Perl's XML::LibXML :

#!/usr/bin/env perl

use strict;
use warnings;
use XML::LibXML;

my $xml = XML::LibXML->new->parse_file('/path/to/input.xml');

my $i = 0;
$_->setAttribute('slonyId', $i++) for $xml->findnodes('//table');

$xml->toFile('/path/to/output.xml')

modify XML tags with specific pattern by Regex tools

Question

5 answers

solution1
3 ACCPTED 2013-06-29 03:47:34

solution2
2 2013-06-29 07:43:51

solution3
1 2013-06-29 04:20:15

solution4
0 2013-06-29 03:57:49

solution5
0 2013-06-29 09:19:30

modify XML tags with specific pattern by Regex tools

Question

5 answers

solution1 3 ACCPTED 2013-06-29 03:47:34

solution2 2 2013-06-29 07:43:51

solution3 1 2013-06-29 04:20:15

solution4 0 2013-06-29 03:57:49

solution5 0 2013-06-29 09:19:30

solution1
3 ACCPTED 2013-06-29 03:47:34

solution2
2 2013-06-29 07:43:51

solution3
1 2013-06-29 04:20:15

solution4
0 2013-06-29 03:57:49

solution5
0 2013-06-29 09:19:30