简体   繁体   中英

How can I detect a sequence of “hollows” (holes, lines not matching a pattern) bigger than n in a text file?

Case scenario :

$ cat Status.txt
1,connected
2,connected
3,connected
4,connected
5,connected
6,connected
7,disconnected
8,disconnected
9,disconnected
10,disconnected
11,disconnected
12,disconnected
13,disconnected
14,connected
15,connected
16,connected
17,disconnected
18,connected
19,connected
20,connected
21,disconnected
22,disconnected
23,disconnected
24,disconnected
25,disconnected
26,disconnected
27,disconnected
28,disconnected
29,disconnected
30,connected

As can be seen, there are "hollows", understanding them as lines with the "disconnected" value inside the sequence file.

I want, in fact, to detect these "holes", but it would be useful if I could set a minimum n of missing numbers in the sequence.
Ie: for ' n=5' a detectable hole would be the 7... 13 part, as there are at least 5 "disconnected" in a row on the sequence. However, the missing 17 should not be considered as detectable in this case. Again, at line 21 whe get a valid disconnection.

Something like:

$ detector Status.txt -n 5 --pattern connected
7
21

... that could be interpreted like:

- Missing more than 5 "connected" starting at 7.
- Missing more than 5 "connected" starting at 21.

I need to script this on Linux shell , so I was thinking about programing some loop, parsing strings and so on, but I feel like if this could be done by using linux shell tools and maybe some simpler programming. Is there a way?

Even when small programs like csvtool are a valid solution, some more common Linux commands (like grep , cut , awk , sed , wc ... etc) could be worth for me when working with embedded devices.

#!/usr/bin/env bash
last_connected=0
min_hole_size=${1:-5}  # default to 5, or take an argument from the command line
while IFS=, read -r num state; do
  if [[ $state = connected ]]; then
    if (( (num-last_connected) > (min_hole_size+1) )); then
      echo "Found a hole running from $((last_connected + 1)) to $((num - 1))"
    fi
    last_connected=$num
  fi
done

# Special case: Need to also handle a hole that's still open at EOF.
if [[ $state != connected ]] && (( num - last_connected > min_hole_size )); then
  echo "Found a hole running from $((last_connected + 1)) to $num"
fi

...emits, given your file on stdin ( ./detect-holes <in.txt ):

Found a hole running from 7 to 13
Found a hole running from 21 to 29

See:

  • BashFAQ #1 - How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
  • The conditional expression -- the [[ ]] syntax used to make it safe to do string comparisons without quoting expansions.
  • Arithmetic comparison syntax -- valid in $(( )) in all POSIX-compliant shells; also available without the expansion side effects as (( )) as a bash extension.

This is the perfect use case for awk, since the machinery of line reading, column splitting, and matching is all built in. The only tricky bit is getting the command line argument to your script, but it's not too bad:

#!/usr/bin/env bash
awk -v window="$1" -F, '
BEGIN { if (window=="") {window = 1} }

$2=="disconnected"{if (consecutive==0){start=NR}; consecutive++}
$2!="disconnected"{if (consecutive>window){print start}; consecutive=0}

END {if (consecutive>window){print start}}'

The window value is supplied as the first command line argument; left out, it defaults to 1, which means "display the start of gaps with at least two consecutive disconnections". Probably could have a better name. You can give it 0 to include single disconnections. Sample output below. (Note that I added series of 2 disconnections at the end to test the failure that Charles metions).

njv@organon:~/tmp$ ./tst.sh 0 < status.txt # any number of disconnections
7
17
21
31
njv@organon:~/tmp$ ./tst.sh < status.txt # at least 2 disconnections
7
21
31
njv@organon:~/tmp$ ./tst.sh 8 < status.txt # at least 9 disconnections
21

Awk solution:

detector.awk script:

#!/bin/awk -f

BEGIN { FS="," }
$2 == "disconnected"{ 
    if (f && NR-c==nr) c++; 
    else { f=1; c++; nr=NR } 
}
$2 == "connected"{ 
    if (f) { 
        if (c > n) { 
            printf "- Missing more than 5 \042connected\042 starting at %d.\n", nr 
        } 
        f=c=0 
    } 
}

Usage:

awk -f detector.awk -v n=5 status.txt

The output:

- Missing more than 5 "connected" starting at 7.
- Missing more than 5 "connected" starting at 21.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM