简体   繁体   English

检查实时数据时间戳并使用bash重定向输出

[英]Examining realtime data timestamp and redirecting output with bash

I have been using socat to pull ASCII streams over UDP and write them to files. 我一直在使用socat通过UDP拉取ASCII流并将其写入文件。 The following is one such line. 以下是这样的一行。

socat UDP-RECV:$UDP_PORT,reuseaddr - | cat >> $INSTRUMENT_$JDAY_RAW &

Each stream being received already has its data timestamped by the sender using ts (part of moreutils) with the year, Julian day, hour, min, second, and msec. 发送方已经使用ts (moreutils的一部分)将接收到的每个流的数据时间戳加上年份,儒略日,小时,分钟,秒和毫秒。 If the Julian day changes, the JDAY variable on the receiving end doesn't get reinitialized and cat merrily keeps piping data into the same file with yesterday's timestamp. 如果儒略日更改,则接收端的JDAY变量不会重新初始化,并且猫会很高兴地将管道数据与昨天的时间戳保存到同一文件中。

Here is an example of the udp stream being received by socat. 这是socat接收udp流的示例。 It is being recorded at 20hz. 录制时间为20hz。

2015 317 06 34 43 303 winch680 000117.9 00000000 00000000.0 2015 317 06 34 43 303绞车680 000117.9 00000000 00000000.0

2015 317 06 34 43 353 winch680 000117.5 00000000 00000000.0 2015 317 06 34 43353绞盘680 000117.5 00000000 00000000.0

Is there some way in bash I can take each line received by socat , examine the jday timestamp field, and change the output file according to that timestamp? bash有什么方法可以让socat接收到的每一行,检查jday时间戳字段,然后根据该时间戳更改输出文件?

Not with cat . 不和cat在一起。 You'll need a [ not bash ] script (eg perl/python or C program). 您将需要一个[ not bash ]脚本(例如perl / python或C程序)。

Replace: 更换:

socat UDP-RECV:$UDP_PORT,reuseaddr - | cat >> $INSTRUMENT_$JDAY_RAW &

With: 带有:

socat UDP-RECV:$UDP_PORT,reuseaddr - | myscript &

Where myscript looks like: myscript如下所示:

while (1) {
    get_data_from_socat_on_stdin();

    if (jdaynew != jdayold) {
        close_output_file();
        jdayold = jdaynew;
    }

    if (output_file_not_open)
        open_output_file(jdaynew);

    write_data_to_output_file();
}

You may parse the input stream using the read built-in program in bash. 您可以使用bash中read内置程序来解析输入流。 You may obtain further information with $ help read . 您可以通过$ help read获得更多信息。 It normally separates tokens using whitespace. 通常,它使用空格分隔标记。 If you provided a two-line preview of what your output looks like, it might be easier to help. 如果提供了两行预览输出的外观,则可能会更容易获得帮助。

The variables $INSTRUMENT , and $JDAY have to be defined before that cat command is launched, because cat will open the file before it starts writing to it. 必须在启动cat命令之前定义变量$INSTRUMENT$JDAY ,因为cat将在开始写入文件之前打开文件。

If $JDAY and $INSTRUMENT are somehow to be extracted from each line, you can use the following bash snippet (assuming lines read by socat look like <INSTRUMENT> <JDAY> <TS> yaddi yadda ... ): 如果以某种方式从每行中提取$JDAY$INSTRUMENT则可以使用以下bash代码段(假设socat读取的行看起来像<INSTRUMENT> <JDAY> <TS> yaddi yadda ... ):

function triage_per_day () {
  while read INSTRUMENT JDAY TS REST; do
    echo "$TS $REST" >> "${INSTRUMENT}_${JDAY}_RAW";
  done
}
triage_per_day < <(socat UDP-RECV:"${UDP_PORT}",reuseaddr -)

If you want to get fancier, you can use file handles to help bash run a bit faster. 如果您想变得更高级,可以使用文件句柄来帮助bash更快地运行。 You can use file descriptor redirections to keep outputting to the same file as long as the day is the same. 只要日期相同,就可以使用文件描述符重定向来保持输出到同一文件。 This will minimize the number of file opens and closes bash has to do. 这将最大程度地减少bash要做的打开和关闭文件的数量。

function triage_per_day () {
  local LAST_JDAY=init
  exec 5>&1 # save stdout
  exec 1>&2 # echos are sent to stderr until JDAY is redefined

  while read INSTRUMENT JDAY TS REST; do
    if [[ "$JDAY" != "$LAST_JDAY" ]]; then
      # we need to change output file
      # send stdout to file in append mode
      exec 1>>"${INSTRUMENT}_${JDAY}_RAW"
      LAST_JDAY="${JDAY}"
    fi
    echo "$TS $REST"
  done

  exec 1>&5 # restore stdout
  exec 5>&- # close stdout copy
}
triage_per_day < <(socat UDP-RECV:"${UDP_PORT}",reuseaddr -)

If you wish to tokenize your lines over different characters than whitespace, say ',' commas, you can locally modify the special variable IFS : 如果您希望使用除空格以外的其他字符来标记行,请说','逗号,则可以在本地修改特殊变量IFS

function extract_ts () {
   local IFS=,; # special bash variable: internal-field-separator
   # $REST will contain everything after the third token. it is a good
   # practice to specify one more name than your last token of interest.
   while read TOK1 TS REST; do
     echo "timestamp is $TS";
   done
}

If you need fancier processing of each line to extract timestamps and other fields, you may instead execute external programs (python/perl/cut/awk/grep, etc.), but this will be much slower than simply sticking with the bash builtin functions like read or echo . 如果您需要对每一行进行更精细的处理以提取时间戳和其他字段,则可以执行外部程序(python / perl / cut / awk / grep等),但这比仅仅坚持使用bash内置函数要慢得多。喜欢readecho If you have to do this, and speed is an issue, you may consider changing your script to a different language that gives you the expressiveness you need. 如果必须这样做,并且速度是一个问题,则可以考虑将脚本更改为其他语言,从而为您提供所需的表达能力。 You may wish to also look into bash Pattern substitution in the manual if you need fancy regular expressions. 如果需要精美的正则表达式,您可能还希望在手册中研究bash Pattern substitution

function extract_ts () {
   # store each line in the variable $LINE
   while read LINE; do
     TS="$(echo "$LINE" | ...)";
     echo "Timestamp is $TS";
   done
 }

Recommended practices 推荐做法

Also, I should mention that it is good practice to surround your bash variables in double quotes (like in the answer) if you intend to use them as filename parameters. 另外,我应该提到,如果您打算将bash变量用作文件名参数,则最好将其bash变量括在双引号中(例如在答案中)。 This is especially true if the names contain spaces or special characters -- like could be expected from a filename derived from dates or times. 如果名称中包含空格或特殊字符,则尤其如此-可能会期望从日期或时间派生的文件名中出现这种情况。 In cases where your variables expand to nothing (due to human or programming error), positional parameters will be missing, with sometimes bad repercussions. 如果变量扩展为零(由于人为或编程错误),则位置参数将丢失,有时会产生不良影响。

Consider: 考虑:

# copy two files to the directory (bad)
$ cp file1 file2 $MYDIR

If $MYDIR is undefined, then this command amounts to overwriting file2 with the contents of file1. 如果$MYDIR未定义,则此命令等同于用file1的内容覆盖file2。 Contrast this with cp file1 file2 "$MYDIR" which will fail early because the target "" does not exist. 将此与cp file1 file2 "$MYDIR"对比,由于目标""不存在,该文件将尽早失败。

Another source for problems that I see in your question is the variable names followed by underscores _ , like $INSTRUMENT . 我在您的问题中看到的另一个问题来源是变量名,后跟下划线_ ,例如$INSTRUMENT Those should be surrounded in curly braces { } . 这些应该用花括号{ }包围。

INSTRUMENT=6
BAR=49
echo $INSTRUMENT_$BAR # prints '49', but you may have expected 6_49

Because _ are valid characters in variable names, bash will attempt to greedily 'glue' the '_' after INSTRUMENT to match the longest valid variable name possible, which would be $INSTRUMENT_ . 因为_是变量名中的有效字符,所以bash将尝试贪婪地'粘' INSTRUMENT后的'_以匹配可能的最长有效变量名,即$INSTRUMENT_ This variable is undefined however, and expands to the empty string, so you're left with the rest, $BAR . 但是,此变量未定义,并且扩展为空字符串,因此剩下剩下的$BAR This example can be correctly rewritten as: 该示例可以正确地重写为:

INSTRUMENT=6
BAR=49
echo ${INSTRUMENT}_${BAR} # prints 6_49

or even better (avoiding future surprises if values ever change)

echo "${INSTRUMENT}_${BAR}" # prints 6_49

This is the code that worked for me. 这是对我有用的代码。 The input udp stream looks like this: 输入udp流如下所示:

2015 317 06 34 43 303 winch680 000117.9 00000000 00000000.0 2015 317 06 34 43 303绞车680 000117.9 00000000 00000000.0

    #!/bin bash
    # This code creates a function which reads the fields in the 
    # udp stream into a table
    # and uses the fields in the table to determine output.
    UDP_PORT=5639
    function DATAOUT () {
        while read YR JDY MIN SEC MSEC INST TENS SPEED LINE; do
            echo "$YR $JDY $HR $MIN $SEC $MSEC $INST $TENS $SPEED $LINE" >> "${INST}_${JDY}_RAW";
        done
    }
    DATAOUT < <(socat udp-recv:${UDP_PORT},reuseaddr -)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM