简体   繁体   English

将日期时间格式与 Bash REGEX 匹配

[英]Match datetime format with Bash REGEX

I have data with this datetime format in bash:我在 bash 中有这种日期时间格式的数据:

28/11/13 06:20:05 (dd/mm/yy hh:mm:ss) 28/11/13 06:20:05 (dd/mm/yy hh:mm:ss)

I need to reformat it like:我需要像这样重新格式化它:

2013-11-28 06:20:05 (MySQL datetime format) 2013-11-28 06:20:05 (MySQL 日期时间格式)

I am using the following regex:我正在使用以下正则表达式:

regex='([0-9][0-9])/([0-9][0-9])/([0-9][0-9])\s([0-9][0-9]/:[0-9][0-9]:[0-9][0-9])'

if [[$line=~$regex]]
then
   $line='20$3-$2-$1 $4';
fi

This produces an error:这会产生一个错误:

./filename: line 10: [[09:34:38=~([0-9][0-9])/([0-9][0-9])/([0-9][0-9])\s([0-9][0-9]/:[0-9][0-9]:[0-9][0-9])]]: No such file or directory

UPDATE:更新:

I want to read this file "line by line", parse it and insert data in mysql database:我想“逐行”读取这个文件,解析它并在mysql数据库中插入数据:

'filenameX': '文件名X':

27/11/13 12:20:05 9984 2885 260 54 288 94 696 1852 32 88 27 7 154
27/11/13 13:20:05 9978 2886 262 54 287 93 696 1854 32 88 27 7 154
27/11/13 14:20:05 9955 2875 262 54 287 93 696 1860 32 88 27 7 154
27/11/13 15:20:04 9921 2874 261 54 284 93 692 1868 32 88 27 7 154
27/11/13 16:20:09 9896 2864 260 54 283 92 689 1880 32 88 27 7 154
27/11/13 17:20:05 9858 2858 258 54 279 92 683 1888 32 88 27 7 154
27/11/13 18:20:04 9849 2853 258 54 279 92 683 1891 32 88 27 7 154
27/11/13 19:20:04 9836 2850 257 54 279 93 683 1891 32 88 27 7 154
27/11/13 20:20:05 9826 2845 257 54 279 93 683 1892 32 88 27 7 154
27/11/13 21:20:05 9820 2847 257 54 278 93 682 1892 32 88 27 7 154
27/11/13 22:20:04 9810 2844 257 54 277 93 681 1892 32 88 27 7 154
27/11/13 23:20:04 9807 2843 257 54 276 93 680 1892 32 88 27 7 154
28/11/13 00:20:05 9809 2843 257 54 276 93 680 1747 29 87 17 6 139
28/11/13 01:20:04 9809 2842 257 54 276 93 680 1747 29 87 17 6 139
28/11/13 02:20:05 9809 2843 256 54 276 93 679 1747 29 87 17 6 139
28/11/13 03:20:04 9808 2842 256 54 276 93 679 1747 29 87 17 6 139
28/11/13 04:20:05 9808 2842 256 54 276 93 679 1747 29 87 17 6 139
28/11/13 05:20:39 9807 2842 256 54 276 93 679 1747 29 87 17 6 139
28/11/13 06:20:05 9804 2840 256 54 276 93 679 1747 29 87 17 6 139

Script:脚本:

#!/bin/bash

echo "Start!"

while IFS='     ' read -ra ADDR;
do
   for line in $(cat results)
   do
      regex='([0-9][0-9])/([0-9][0-9])/([0-9][0-9]) ([0-9][0-9]:[0-9][0-9]:[0-9]$
      if [[ $line =~ $regex ]]; then
         $line="20${BASH_REMATCH[3]}-${BASH_REMATCH[2]}-${BASH_REMATCH[1]} ${BASH_REMATCH[4]}"
      fi
      echo "insert into table(time, total, caracas, anzoategui) values('$line', '$line', '$line', '$line', '$line');"
   done | mysql -user -password database;
done < filenameX

Result:结果:

time |时间 | total |总计 | caracas |加拉加斯| anzoategui |安佐特吉| 0000-00-00 00:00:00 | 0000-00-00 00:00:00 | 9 | 9 | 9 | 9 | 9 | 9 |
2027-11-13 00:00:00 | 2027-11-13 00:00:00 | 15 | 15 | 15 | 15 | 15 | 15 |

Note : This answer was accepted based on fixing the bash-focused approach in the OP.注意:此答案是基于修复 OP 中以 bash 为重点的方法而被接受的。 For a simpler, awk -based solution see the last section of this answer.有关更简单的基于awk的解决方案,请参阅此答案的最后一部分。

Try the following:请尝试以下操作:

line='28/11/13 06:20:05' # sample input

regex='([0-9][0-9])/([0-9][0-9])/([0-9][0-9]) ([0-9][0-9]:[0-9][0-9]:[0-9][0-9])'

if [[ $line =~ $regex ]]; then
  line="20${BASH_REMATCH[3]}-${BASH_REMATCH[2]}-${BASH_REMATCH[1]} ${BASH_REMATCH[4]}"
fi

echo "$line"  # -> '2013-11-28 06:20:05'

As for why your code didn't work:至于为什么你的代码不起作用:

  • As @anubhava pointed out, you need at least 1 space to the right of [[ and to the left of ]] .正如@anubhava 所指出的,在[[的右侧和]]的左侧至少需要 1 个空格。
  • Whether \\s works in a bash regex is platform-dependent (Linux: yes; OSX: no), so a single, literal space is the safer choice here. \\s是否在 bash 正则表达式中工作是平台相关的(Linux:是;OSX:否),因此单个文字空间在这里是更安全的选择。
  • Your variable assignment was incorrect ( $line = ... ) - when assigning to a variable, never prefix the variable name with $ .您的变量分配不正确( $line = ... ) -分配给变量时,切勿在变量名前加上$
  • Your backreferences were incorrect ( $1 , ...): to refer to capture groups (subexpressions) in a bash regex you have to use the special ${BASH_REMATCH[@]} array variable;您的反向引用不正确( $1 ,...):要在 bash 正则表达式中引用捕获组(子表达式),您必须使用特殊的${BASH_REMATCH[@]}数组变量; ${BASH_REMATCH[0]} contains the entire string that matched, ${BASH_REMATCH[1]} contains what the first capture group matched, and so on; ${BASH_REMATCH[0]}包含匹配的整个字符串, ${BASH_REMATCH[1]}包含第一个捕获组匹配的内容,依此类推; by contrast, $1 , $2 , ... refer to the 1st, 2nd, ... argument passed to a shell script or function.相比之下, $1 , $2 , ... 指的是传递给 shell 脚本或函数的第 1, 2, ... 参数。

Update , to address the OP's updated question: Update ,以解决 OP 的更新问题:

I think the following does what you want:认为以下做你想要的:

# Read input file and store each col. value in separate variables.
while read -r f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15; do

    # Concatenate the first 2 cols. to form a date + time string.
    dt="$f1 $f2"

    # Parse and reformat the date + time string.
    regex='([0-9][0-9])/([0-9][0-9])/([0-9][0-9]) ([0-9][0-9]:[0-9][0-9]:[0-9][0-9])'
    if [[ "$dt" =~ $regex ]]; then
      dt="20${BASH_REMATCH[3]}-${BASH_REMATCH[2]}-${BASH_REMATCH[1]} ${BASH_REMATCH[4]}"
    fi

    # Echo the SQL command; all of them are piped into a `mysql` command
    # at the end of the loop.
    # !! Fill the $f<n> variables in as needed - I don't know which ones you need.
    # !! Make sure the number column name matches the number of values.
    # !! Your original code had 4 column names, but 5 values, causing an error.
    echo "insert into table(time, total, caracas, anzoategui) values('$dt', '$f3', '$f4', '$f5');"

done < filenameX | mysql -user -password database

Afterthought : The above solution is based on improvements to the OP's code;事后思考:上述解决方案基于对 OP 代码的改进; below is a streamlined solution that is a one-liner based on awk (spread across multiple lines for readability - tip of the hat to @twalberg for the awk-based date reformatting):下面是一个简化的解决方案,它是基于awk的单行解决方案(为了可读性而分布在多行中 - 对基于 awk 的日期重新格式化的@twalberg 的提示):

awk -v sq=\' '{
 split($1, tkns, "/");
 dt=sprintf("20%s-%s-%s", tkns[3], tkns[2], tkns[1]); 
 printf "insert into table(time,total,caracas,anzoategui) values(%s,%s,%s,%s);", 
   sq dt " " $2 sq, sq $3 sq, sq $4 sq, sq $5 sq
}' filenameX | mysql -user -password database

Note: To make quoting inside the awk program simpler, a single quote is passed in via variable sq ( -v sq=\\' ).注意:为了使awk程序中的引用更简单,通过变量sq ( -v sq=\\' ) 传入单引号。

Perl is handy here. Perl 在这里很方便。

dt="28/11/13 06:20:05"
perl -MTime::Piece -E "say Time::Piece->strptime('$dt', '%d/%m/%y %T')->strftime('%Y-%m-%d %T')"
2013-11-28 06:20:05

This does the trick without any overly complicated regex invocations:这可以在没有任何过于复杂的正则表达式调用的情况下实现:

echo "28/11/13 06:20:05" | awk -F'[/ ]' \
    '{printf "20%s-%s-%s %s\n", $3, $2, $1, $4}'

Or, as suggested by @fedorqui in the comments, if the source of your timestamp is date , you can just give it the formatting options you want...或者,正如@fedorqui 在评论中所建议的那样,如果时间戳的来源是date ,则可以为其提供所需的格式选项...

Spaces are mandatory in BASH so use: BASH 中的空格是强制性的,因此请使用:

[[ "$line" =~ $regex ]] && echo "${line//\//-}"

Also you cannot use \\s in BASH so use this regex:你也不能在 BASH 中使用\\s所以使用这个正则表达式:

regex='([0-9][0-9])/([0-9][0-9])/([0-9][0-9]) ([0-9][0-9]:[0-9][0-9]:[0-9][0-9])'

thanks all for the sample above.感谢大家提供上面的示例。

"T" not appended未附加“T”

$line='"2020-11-26 10:20:01.000000","the size of the table is 3.5" (inches)","2020-12-11 10:20:02"'
$echo "$line" | sed -r 's#(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2})#\2T\1#g'
"2020-11-26 10:20:01.000000","the size of the table is 3.5" (inches)","2020-12-11 10:20:02"

"T" appended only to middle of first column and not any other column with date format in the row “T”仅附加到第一列的中间,而不附加到行中具有日期格式的任何其他列

$awk '/[0-9]{4}-(0[1-9]|1[0-2])-(0[1-9]|[1-2][0-9]|3[0-1]) (2[0-3]|[01][0-9]):[0-5][0-9]*/{print}' test_file |sed -e 's/\s/\T/'
"2020-11-26T10:20:01.000000","the size of the table is 3.5" (inches)","2020-12-11 10:20:02"

example from above with grouping上面带有分组的示例

$ line='"2020-11-26 10:20:01.000000","the size of the table is 3.5" (inches)","2020-12-11 10:20:02"'
$ regex='([0-9][0-9])-([0-9][0-9])-([0-9][0-9]) ([0-9][0-9]:[0-9][0-9]:[0-9][0-9])'
$ if [[ $line =~ $regex ]]; then line="20${BASH_REMATCH[3]}-${BASH_REMATCH[2]}-${BASH_REMATCH[1]}T${BASH_REMATCH[4]}"; fi
$ echo "$line" 
2026-11-20T10:20:01

#...the intention is to append "T" between date and time (same field) on all fields within huge csv file with millions of records, not just first column, all having same date format YYYY-MM-DD HH24:MI:SS #...目的是在具有数百万条记录的巨大 csv 文件中的所有字段的日期和时间(相同字段)之间附加“T”,而不仅仅是第一列,所有字段都具有相同的日期格式 YYYY-MM-DD HH24:MI :SS

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM