简体   繁体   中英

Examining realtime data timestamp and redirecting output with bash

I have been using socat to pull ASCII streams over UDP and write them to files. The following is one such line.

socat UDP-RECV:$UDP_PORT,reuseaddr - | cat >> $INSTRUMENT_$JDAY_RAW &

Each stream being received already has its data timestamped by the sender using ts (part of moreutils) with the year, Julian day, hour, min, second, and msec. If the Julian day changes, the JDAY variable on the receiving end doesn't get reinitialized and cat merrily keeps piping data into the same file with yesterday's timestamp.

Here is an example of the udp stream being received by socat. It is being recorded at 20hz.

2015 317 06 34 43 303 winch680 000117.9 00000000 00000000.0

2015 317 06 34 43 353 winch680 000117.5 00000000 00000000.0

Is there some way in bash I can take each line received by socat , examine the jday timestamp field, and change the output file according to that timestamp?

Not with cat . You'll need a [ not bash ] script (eg perl/python or C program).

Replace:

socat UDP-RECV:$UDP_PORT,reuseaddr - | cat >> $INSTRUMENT_$JDAY_RAW &

With:

socat UDP-RECV:$UDP_PORT,reuseaddr - | myscript &

Where myscript looks like:

while (1) {
    get_data_from_socat_on_stdin();

    if (jdaynew != jdayold) {
        close_output_file();
        jdayold = jdaynew;
    }

    if (output_file_not_open)
        open_output_file(jdaynew);

    write_data_to_output_file();
}

You may parse the input stream using the read built-in program in bash. You may obtain further information with $ help read . It normally separates tokens using whitespace. If you provided a two-line preview of what your output looks like, it might be easier to help.

The variables $INSTRUMENT , and $JDAY have to be defined before that cat command is launched, because cat will open the file before it starts writing to it.

If $JDAY and $INSTRUMENT are somehow to be extracted from each line, you can use the following bash snippet (assuming lines read by socat look like <INSTRUMENT> <JDAY> <TS> yaddi yadda ... ):

function triage_per_day () {
  while read INSTRUMENT JDAY TS REST; do
    echo "$TS $REST" >> "${INSTRUMENT}_${JDAY}_RAW";
  done
}
triage_per_day < <(socat UDP-RECV:"${UDP_PORT}",reuseaddr -)

If you want to get fancier, you can use file handles to help bash run a bit faster. You can use file descriptor redirections to keep outputting to the same file as long as the day is the same. This will minimize the number of file opens and closes bash has to do.

function triage_per_day () {
  local LAST_JDAY=init
  exec 5>&1 # save stdout
  exec 1>&2 # echos are sent to stderr until JDAY is redefined

  while read INSTRUMENT JDAY TS REST; do
    if [[ "$JDAY" != "$LAST_JDAY" ]]; then
      # we need to change output file
      # send stdout to file in append mode
      exec 1>>"${INSTRUMENT}_${JDAY}_RAW"
      LAST_JDAY="${JDAY}"
    fi
    echo "$TS $REST"
  done

  exec 1>&5 # restore stdout
  exec 5>&- # close stdout copy
}
triage_per_day < <(socat UDP-RECV:"${UDP_PORT}",reuseaddr -)

If you wish to tokenize your lines over different characters than whitespace, say ',' commas, you can locally modify the special variable IFS :

function extract_ts () {
   local IFS=,; # special bash variable: internal-field-separator
   # $REST will contain everything after the third token. it is a good
   # practice to specify one more name than your last token of interest.
   while read TOK1 TS REST; do
     echo "timestamp is $TS";
   done
}

If you need fancier processing of each line to extract timestamps and other fields, you may instead execute external programs (python/perl/cut/awk/grep, etc.), but this will be much slower than simply sticking with the bash builtin functions like read or echo . If you have to do this, and speed is an issue, you may consider changing your script to a different language that gives you the expressiveness you need. You may wish to also look into bash Pattern substitution in the manual if you need fancy regular expressions.

function extract_ts () {
   # store each line in the variable $LINE
   while read LINE; do
     TS="$(echo "$LINE" | ...)";
     echo "Timestamp is $TS";
   done
 }

Recommended practices

Also, I should mention that it is good practice to surround your bash variables in double quotes (like in the answer) if you intend to use them as filename parameters. This is especially true if the names contain spaces or special characters -- like could be expected from a filename derived from dates or times. In cases where your variables expand to nothing (due to human or programming error), positional parameters will be missing, with sometimes bad repercussions.

Consider:

# copy two files to the directory (bad)
$ cp file1 file2 $MYDIR

If $MYDIR is undefined, then this command amounts to overwriting file2 with the contents of file1. Contrast this with cp file1 file2 "$MYDIR" which will fail early because the target "" does not exist.

Another source for problems that I see in your question is the variable names followed by underscores _ , like $INSTRUMENT . Those should be surrounded in curly braces { } .

INSTRUMENT=6
BAR=49
echo $INSTRUMENT_$BAR # prints '49', but you may have expected 6_49

Because _ are valid characters in variable names, bash will attempt to greedily 'glue' the '_' after INSTRUMENT to match the longest valid variable name possible, which would be $INSTRUMENT_ . This variable is undefined however, and expands to the empty string, so you're left with the rest, $BAR . This example can be correctly rewritten as:

INSTRUMENT=6
BAR=49
echo ${INSTRUMENT}_${BAR} # prints 6_49

or even better (avoiding future surprises if values ever change)

echo "${INSTRUMENT}_${BAR}" # prints 6_49

This is the code that worked for me. The input udp stream looks like this:

2015 317 06 34 43 303 winch680 000117.9 00000000 00000000.0

    #!/bin bash
    # This code creates a function which reads the fields in the 
    # udp stream into a table
    # and uses the fields in the table to determine output.
    UDP_PORT=5639
    function DATAOUT () {
        while read YR JDY MIN SEC MSEC INST TENS SPEED LINE; do
            echo "$YR $JDY $HR $MIN $SEC $MSEC $INST $TENS $SPEED $LINE" >> "${INST}_${JDY}_RAW";
        done
    }
    DATAOUT < <(socat udp-recv:${UDP_PORT},reuseaddr -)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM