简体   繁体   中英

How to extract specific key value pairs from a grep output

I have the output of grep in a folder as below,

./Data1/TEST_Data1.xml:<def-query collection="FT_R1Event" count="-1" desc="" durationEnd="1" durationStart="0" durationType="CAL" fromWS="Data1" id="_q1" timeUnit="D">

./Data2/TEST_Data2.xml:<def-query collection="FT_R2Event" count="-1" desc="" durationEnd="2" durationStart="0" durationType="ABS" fromWS="Data2" id="_q1" timeUnit="M">

I want to extract the below followed by some delimiter, say ',' as below,

Data1/TEST_Data1, durationEnd="1", timeUnit="D"

Data2/TEST_Data2, durationEnd="2", timeUnit="M"

Please help me in achieveing this using the basic linux commands.

I would do it using GNU AWK following way. Let file.txt content be

./Data1/TEST_Data1.xml:<def-query collection="FT_R1Event" count="-1" desc="" durationEnd="1" durationStart="0" durationType="CAL" fromWS="Data1" id="_q1" timeUnit="D">

./Data2/TEST_Data2.xml:<def-query collection="FT_R2Event" count="-1" desc="" durationEnd="2" durationStart="0" durationType="ABS" fromWS="Data2" id="_q1" timeUnit="M">

then

awk 'BEGIN{OFS=", ";FPAT="(^[^ ]+xml)|((durationEnd|timeUnit)=\"[^\"]+\")"}{gsub(/\.([/]|xml)/, "", $1);print}' file.txt

output

Data1/TEST_Data1, durationEnd="1", timeUnit="D"

Data2/TEST_Data2, durationEnd="2", timeUnit="M"

Explanation: I used FPAT to extract interesting elements of input, namely these which from start does not contain spaces and are following by xml or (( durationEnd or timeUnit ) followed by " non- " " ). Then I remove . followed by / or xml (note that . has to be literal . so it is escaped). Then I print everything, which is joined by , as I set it as output field seperator ( OFS ).

Disclaimer: I tested it only with shown samples.

(tested in gawk 4.2.1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM