简体   繁体   English

将来自 awk 模式匹配的字段传递到 shell 日期命令替换

[英]Passing field from awk pattern match into shell date command substitution

I would like to convert a (ISO 8601) date from YYYY-MM-DD to something that looks like Mon, 03 Dec 2021 00:00:00 -0600 (RFC-822 / RFC 5322).我想将(ISO 8601)日期从YYYY-MM-DD转换为类似于Mon, 03 Dec 2021 00:00:00 -0600 (RFC-822 / RFC 5322) 的日期。 I have not been able to find questions that address this specific issue.我一直无法找到解决这个特定问题的问题。

I have an RSS XML file that looks like this:我有一个 RSS XML 文件,如下所示:

bash-5.1$ cat feed.xml                                                                                                                                                                                              
<rss version="2.0">                                                                                                                                                                                 
    <channel>                                                                                                                                                               
        <title>Title string</title>                                                                                                                                                                                               
        <link>https://domain/feed.xml</link>                                                                                                                                                                                                
        <description>Description string here</description>                                                                                                                                                    
        <language>en-us</language>                                                                                                                                                                                                             
<item>                                                                                                                                          
<title>title string here</title>                                                                                                                                                                                 
<link>link string with https style information</link>                                                                                                                                                                 
<guid>link string with https style information</guid>                                                                                                                                                                 
<pubDate>2021-12-03</pubDate>                                                                                                                                                                                                               
</item>                                                                                                                                             
<item>                                                                                                                                          
<title>title string here</title>                                                                                                                                                                                 
<link>link string with https style information</link>                                                                                                                                                           
<guid>link string with https style information</guid>                                                                                                                                                           
<pubDate>2019-08-13</pubDate>                                                                                                                                                                                                               
</item>       
<item>                                                                                                                                          
<title>title string here</title>                                                                                                                                                                                 
<link>link string with https style information</link>                                                                                                                                                           
<guid>link string with https style information</guid>                                                                                                                                                           
<pubDate>2018-11-23</pubDate>                                                                                                                                                                                                              
</item>                                                                                                                                                                                                                                                                                
</channel>
</rss>

The output I expect after parsing would be something like this:我预计解析后的 output 会是这样的:

...
<pubDate>Fri, 03 Dec 2021 00:00:00 -0600</pubDate> 
...
<pubDate>Tue, 13 Aug 2019 00:00:00 -0500</pubDate> 
...
<pubDate>Fri, 23 Nov 2018 00:00:00 -0600</pubDate> 
...

This output can usually be achieved with the bash date command like so:这个 output 通常可以使用 bash date命令来实现,如下所示:

bash-5.1$ date -d "2018-11-23" +"%a, %d %b %Y %T %z"
Fri, 23 Nov 2018 00:00:00 -0600

I am trying to accomplish this with awk because I learned that command substitution is possible, and I believe I am close, but not quite:我正在尝试使用awk来完成此操作,因为我了解到命令替换是可能的,我相信我很接近,但不完全是:

bash-5.1$ awk -F "[><]" -v date="$(date +"%a, %d %b %Y %T %z" -d "$3")" '/pubDate/ {print $3date}' feed.xml 
2021-12-03Sun, 05 Dec 2021 00:00:00 -0600
2019-08-13Sun, 05 Dec 2021 00:00:00 -0600
2018-11-23Sun, 05 Dec 2021 00:00:00 -0600

It seems like the pattern match executes successfully as well as the date command, but the awk $3 field looks like is not being passed into the shell date command and so the current time is showing instead of the transformed time.看起来模式匹配和日期命令一样成功执行,但是 awk $3字段看起来没有被传递到 shell date命令中,因此显示的是当前时间而不是转换后的时间。

How do I pass the $3 field from the awk pattern match into the date command so that it can convert the date based on the field value?如何将 awk 模式匹配中的$3字段传递到 date 命令中,以便它可以根据字段值转换日期?

Any help is greatly appreciated!任何帮助是极大的赞赏!

Altering your awk code slightly.稍微更改您的awk代码。

$ awk -v date="$(date +"%a, %d %b %Y %T %z")" 'BEGIN {FS=OFS=">"}$1~/pubDate/{split($2,a,"<"); a[1]=date; $2=a[1]"<"a[2]}1' input_file
<rss version=2.0>                                                                                                                                                                            
    <channel>
        <title>Title string</title>                                                                                                                                                          
        <link>https://domain/feed.xml</link>                                                                                                                                                 
        <description>Description string here</description>                                                                                                                                   
        <language>en-us</language>                                                                                                                                                           
<item>
<title>title string here</title>                                                                                                                                                             
<link>link string with https style information</link>                                                                                                                                        
<guid>link string with https style information</guid>                                                                                                                                        
<pubDate>Mon, 06 Dec 2021 00:00:00 +0000</pubDate>                                                                                                                                           
</item>
<item>
<title>title string here</title>                                                                                                                                                             
<link>link string with https style information</link>                                                                                                                                        
<guid>link string with https style information</guid>                                                                                                                                        
<pubDate>Mon, 06 Dec 2021 00:00:00 +0000</pubDate>                                                                                                                                           
</item>
<item>
<title>title string here</title>                                                                                                                                                             
<link>link string with https style information</link>                                                                                                                                        
<guid>link string with https style information</guid>                                                                                                                                        
<pubDate>Mon, 06 Dec 2021 00:00:00 +0000</pubDate>                                                                                                                                           
</item>                                                                                                                                                                                      
</channel>
</rss>

sed can also be used if the variable is set.如果设置了变量,也可以使用sed

$ date="$(date +"%a, %d %b %Y %T %z")" 
$ sed "/pubDate/ s/....-..-../$date/" input_file

As commented, it is recommended to use xml parsing tool speaking generally.如评论,一般来说推荐使用xml解析工具。 If the xml file is well aligned as the provided example, bash or awk may work under limited conditions.如果 xml 文件与提供的示例对齐良好,则bashawk可能在有限的条件下工作。
With bash:使用 bash:

#!/bin/bash

while IFS= read -r line; do
    if [[ $line =~ (.*<pubDate>)([0-9]{4}-[0-9]{2}-[0-9]{2})(</pubDate>.*) ]]; then
        datestr=$(date -d "${BASH_REMATCH[2]}" +"%a, %d %b %Y %T %z")
        line="${BASH_REMATCH[1]}$datestr${BASH_REMATCH[3]}"
    fi
    echo "$line"
done < feed.xml

The condition $line =~ (.*<pubDate>)([0-9]{4}-[0-9]{2}-[0-9]{2})(</pubDate>.*) matches the <pubdate> line assiging bash variable ${BASH_REMATCH[@] to the parenthesized substrings.条件$line =~ (.*<pubDate>)([0-9]{4}-[0-9]{2}-[0-9]{2})(</pubDate>.*)匹配<pubdate>行将 bash 变量${BASH_REMATCH[@]分配给带括号的子字符串。 Then line is reconstructed using the reformatted date string.然后使用重新格式化的日期字符串重建line

If gawk which supports mktime() and strftime() functions, you can also say with gawk :如果gawk支持mktime()strftime()函数,你也可以用gawk说:

awk '
{
    if (match($0, /^(.*<pubDate>)([0-9]{4})-([0-9]{2})-([0-9]{2})(<\/pubDate>.*)/, a) ) {
        ts = mktime(a[2] " " a[3] " " a[4] " 00 00 00")    # timestamp since the epoch
        datestr = strftime("%a, %d %b %Y %T %z", ts)
        $0 = a[1] datestr a[5]
    }
} 1' feed.xml

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM