简体   繁体   中英

Flatten data with regex

I'm using a program that pings servers and returns results. The resulting data in text format is the exact opposite of what I need which is a CSV. I'm awful at with regex and this seems to me that it would be very complex to flatten out this data.

Data before regex

  1.2.  3.  4 |  Min  |  Avg  |  Max  |Std.Dev|Reliab%|
  ----------------+-------+-------+-------+-------+-------+
  + Cached Name   | 0.000 | 0.000 | 0.000 | 0.000 | 100.0 |
  + Uncached Name | 0.040 | 0.100 | 0.250 | 0.065 | 100.0 |
  + DotCom Lookup | 0.049 | 0.121 | 0.182 | 0.040 | 100.0 |
  ---<-------->---+-------+-------+-------+-------+-------+
                  sub.domain.com
                Some Provider, LLC

  5.6.  7.  8 |  Min  |  Avg  |  Max  |Std.Dev|Reliab%|
  ----------------+-------+-------+-------+-------+-------+
  + Cached Name   | 0.000 | 0.000 | 0.000 | 0.000 | 100.0 |
  + Uncached Name | 0.040 | 0.100 | 0.250 | 0.065 | 100.0 |
  + DotCom Lookup | 0.049 | 0.121 | 0.182 | 0.040 | 100.0 |
  ---<-------->---+-------+-------+-------+-------+-------+
                  bus.domain2.net
                Some Other Provider, Inc

And so on

Here's what I'm trying to extract using regex/sed

Domain,Company,IP,Cached Name Min,Cached Name Max,Cached Name Avg,Cached Name Std.Dev,Cached Name Reliab%,IP,Uncached Name Min,Uncached Name Max,Uncached Name Avg,Uncached Name Std.Dev,Uncached Name Reliab%,IP,Cached Name Min,Cached Name Max,Cached Name Avg,Cached Name Std.Dev,Cached Name Reliab%,IP,DotCom Lookup Min,DotCom Lookup Max,DotCom Lookup Avg,DotCom Lookup Std.Dev,DotCom Lookup Reliab%
sub.domain.com,Some Provider - LLC,1.2.3.4,0.000,0.000,0.000,0.000,100.0,0.040,0.250,0.100,0.065,100.0,0.049,0.182,0.121,0.040,100
bus.domain2.net,Some Other Provider - Inc,5.6.7.8,0.000,0.000,0.000,0.000,100.0,0.040,0.250,0.100,0.065,100.0,0.040,0.250,0.100,0.065,100.0,0.049,0.182,0.121,0.040,100.0

Is this use-case too complex for regex/sed? Does anyone have any clue how I'd achieve this?

Using sed for this might not be the best choice, but sometimes the circumstances or desires override that thought.

So here is an sed solution:

sed -En "s/^\s*([[:digit:]]+\.)\s*([[:digit:]]+\.)\s*([[:digit:]]+\.)\s*([[:digit:]]+)\s*\|.*$/\1\2\3\4\,/;T;{N;N;N;N;s/\n[^|]+\|//g;s/ \| /,/g;s/ \|//;x;N;z;N;N;s/,/ -/g;G;s/\n\s*/,/g;s/^,//;p}" input.txt
  • look for the line with the "1.2.3.4", IP;
    try to extract the IP, if that fails try next line
    s/^\\s*([[:digit:]]+\\.)\\s*([[:digit:]]+\\.)\\s*([[:digit:]]+\\.)\\s*([[:digit:]]+)\\s*\\|.*$/\\1\\2\\3\\4\\,/;T;
  • in case of success T;{
  • get next few lines and delete (or replace by ",") a lot of unneeded stuff
    N;N;N;N;s/\\n[^|]+\\|//g;s/ \\| /,/g;s/ \\|//;
  • store that in hold space and ignore one following line x;N;z;
  • get next to lines and sanitize "," N;N;s/,/ -/g;
  • append what is stored in hold space G;
  • some makeup to get "," (only) in the right places s/\\n\\s*/,/g;s/^,//;
  • print and done p}

Output:

sub.domain.com,Some Provider - LLC,1.2.3.4, 0.000,0.000,0.000,0.000,100.0,0.040,0.100,0.250,0.065,100.0,0.049,0.121,0.182,0.040,100.0  
bus.domain2.net,Some Other Provider - Inc,5.6.7.8, 0.000,0.000,0.000,0.000,100.0,0.040,0.100,0.250,0.065,100.0,0.049,0.121,0.182,0.040,100.0  

Very similar to desired output, except some " " after the 1.2.3.4, .
Is that a problem?

(Actually I have doubts that the desired output matches the sample input, can you double check?)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM