I have a file like below
<AUDIT_RECORD TIMESTAMP="2013-07-30T17:52:29" NAME="Query" CONNECTION_ID="10" STATUS="0" SQLTEXT="show databases"/>
<AUDIT_RECORD TIMESTAMP="2013-07-29T17:27:53" NAME="Quit" CONNECTION_ID="12" STATUS="0"/>
<AUDIT_RECORD TIMESTAMP="2013-07-30T17:52:29" NAME="Query" CONNECTION_ID="10" STATUS="0" SQLTEXT="show grants for root@localhost"/>
<AUDIT_RECORD TIMESTAMP="2013-07-30T17:52:29" NAME="Query" CONNECTION_ID="10" STATUS="0" SQLTEXT="create table stamp like paper"/>
Here each record begin with <AUDIT_RECORD
and end with "/>
and the record might spread across multiple lines.
My requirement is to display result like below
<AUDIT_RECORD TIMESTAMP="2013-07-30T17:52:29" NAME="Query" CONNECTION_ID="10" STATUS="0" SQLTEXT="show databases"/>
<AUDIT_RECORD TIMESTAMP="2013-07-30T17:52:29" NAME="Query" CONNECTION_ID="10" STATUS="0" SQLTEXT="show grants for root@localhost"/>
<AUDIT_RECORD TIMESTAMP="2013-07-30T17:52:29" NAME="Query" CONNECTION_ID="10" STATUS="0" SQLTEXT="create table stamp like paper"/>
for that purpose I have used
sed -n "/Query/,/\/>/p" file.txt
but it is displaying the entire file including the record with the string "Quit".
Can anyone help me regarding this? Also please let me know if it is possible to match exactly string named "Query" ( like grep -w "Query"
).
With GNU awk so you can set the RS to more than one character:
$ cat file
<AUDIT_RECORD TIMESTAMP="2013-07-30T17:52:29" NAME="Query"
CONNECTION_ID="10" STATUS="0" SQLTEXT="show databases"/>
<AUDIT_RECORD TIMESTAMP="2013-07-29T17:27:53"
NAME="Quit" CONNECTION_ID="12" STATUS="0"/>
<AUDIT_RECORD
TIMESTAMP="2013-07-30T17:52:29" NAME="Query" CONNECTION_ID="10"
STATUS="0" SQLTEXT="show grants for root@localhost"/>
<AUDIT_RECORD
TIMESTAMP="2013-07-30T17:52:29"
NAME="Query"
CONNECTION_ID="10"
STATUS="0"
SQLTEXT="create table stamp like paper"/>
$
$ gawk -v RS='\\/>\n' -v ORS= '/Query/{print $0 RT}' file
<AUDIT_RECORD TIMESTAMP="2013-07-30T17:52:29" NAME="Query"
CONNECTION_ID="10" STATUS="0" SQLTEXT="show databases"/>
<AUDIT_RECORD
TIMESTAMP="2013-07-30T17:52:29" NAME="Query" CONNECTION_ID="10"
STATUS="0" SQLTEXT="show grants for root@localhost"/>
<AUDIT_RECORD
TIMESTAMP="2013-07-30T17:52:29"
NAME="Query"
CONNECTION_ID="10"
STATUS="0"
SQLTEXT="create table stamp like paper"/>
$
$ gawk -v RS='\\/>\n' -v ORS= '/Query/{$1=$1; print $0 RT}' file
<AUDIT_RECORD TIMESTAMP="2013-07-30T17:52:29" NAME="Query" CONNECTION_ID="10" STATUS="0" SQLTEXT="show databases"/>
<AUDIT_RECORD TIMESTAMP="2013-07-30T17:52:29" NAME="Query" CONNECTION_ID="10" STATUS="0" SQLTEXT="show grants for root@localhost"/>
<AUDIT_RECORD TIMESTAMP="2013-07-30T17:52:29" NAME="Query" CONNECTION_ID="10" STATUS="0" SQLTEXT="create table stamp like paper"/>
I agree with @choroba that an XML parser is the right tool. However, if there isn't one available you could try this awk script:
awk '/Query/{print RS" "$0}' RS='<AUDIT_RECORD' file
The input is probably XML. Use a proper parser to handle it, especially if the records span over multiple lines. For example, xsh :
open file.xml ;
remove //AUDIT_RECORD[not(@NAME="Query")] ;
save :b ;
My proposed sed solution :
sed 's/<[^>]*\"Quit\"[^>]*>//' file.txt
For records spanning multiple lines, try :
sed '{:q;N;s/\n/ /g;t q}' file.txt | sed 's/<[^>]*\"Quit\"[^>]*>//'
Add line feed RS :
... | sed 's|/>|/>\n|g'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.