[英]Passing field from awk pattern match into shell date command substitution
I would like to convert a (ISO 8601) date from YYYY-MM-DD
to something that looks like Mon, 03 Dec 2021 00:00:00 -0600
(RFC-822 / RFC 5322).我想将(ISO 8601)日期从
YYYY-MM-DD
转换为类似于Mon, 03 Dec 2021 00:00:00 -0600
(RFC-822 / RFC 5322) 的日期。 I have not been able to find questions that address this specific issue.我一直无法找到解决这个特定问题的问题。
I have an RSS XML file that looks like this:我有一个 RSS XML 文件,如下所示:
bash-5.1$ cat feed.xml
<rss version="2.0">
<channel>
<title>Title string</title>
<link>https://domain/feed.xml</link>
<description>Description string here</description>
<language>en-us</language>
<item>
<title>title string here</title>
<link>link string with https style information</link>
<guid>link string with https style information</guid>
<pubDate>2021-12-03</pubDate>
</item>
<item>
<title>title string here</title>
<link>link string with https style information</link>
<guid>link string with https style information</guid>
<pubDate>2019-08-13</pubDate>
</item>
<item>
<title>title string here</title>
<link>link string with https style information</link>
<guid>link string with https style information</guid>
<pubDate>2018-11-23</pubDate>
</item>
</channel>
</rss>
The output I expect after parsing would be something like this:我预计解析后的 output 会是这样的:
...
<pubDate>Fri, 03 Dec 2021 00:00:00 -0600</pubDate>
...
<pubDate>Tue, 13 Aug 2019 00:00:00 -0500</pubDate>
...
<pubDate>Fri, 23 Nov 2018 00:00:00 -0600</pubDate>
...
This output can usually be achieved with the bash date
command like so:这个 output 通常可以使用 bash
date
命令来实现,如下所示:
bash-5.1$ date -d "2018-11-23" +"%a, %d %b %Y %T %z"
Fri, 23 Nov 2018 00:00:00 -0600
I am trying to accomplish this with awk
because I learned that command substitution is possible, and I believe I am close, but not quite:我正在尝试使用
awk
来完成此操作,因为我了解到命令替换是可能的,我相信我很接近,但不完全是:
bash-5.1$ awk -F "[><]" -v date="$(date +"%a, %d %b %Y %T %z" -d "$3")" '/pubDate/ {print $3date}' feed.xml
2021-12-03Sun, 05 Dec 2021 00:00:00 -0600
2019-08-13Sun, 05 Dec 2021 00:00:00 -0600
2018-11-23Sun, 05 Dec 2021 00:00:00 -0600
It seems like the pattern match executes successfully as well as the date command, but the awk $3
field looks like is not being passed into the shell date
command and so the current time is showing instead of the transformed time.看起来模式匹配和日期命令一样成功执行,但是 awk
$3
字段看起来没有被传递到 shell date
命令中,因此显示的是当前时间而不是转换后的时间。
How do I pass the $3
field from the awk pattern match into the date command so that it can convert the date based on the field value?如何将 awk 模式匹配中的
$3
字段传递到 date 命令中,以便它可以根据字段值转换日期?
Any help is greatly appreciated!任何帮助是极大的赞赏!
Altering your awk
code slightly.稍微更改您的
awk
代码。
$ awk -v date="$(date +"%a, %d %b %Y %T %z")" 'BEGIN {FS=OFS=">"}$1~/pubDate/{split($2,a,"<"); a[1]=date; $2=a[1]"<"a[2]}1' input_file
<rss version=2.0>
<channel>
<title>Title string</title>
<link>https://domain/feed.xml</link>
<description>Description string here</description>
<language>en-us</language>
<item>
<title>title string here</title>
<link>link string with https style information</link>
<guid>link string with https style information</guid>
<pubDate>Mon, 06 Dec 2021 00:00:00 +0000</pubDate>
</item>
<item>
<title>title string here</title>
<link>link string with https style information</link>
<guid>link string with https style information</guid>
<pubDate>Mon, 06 Dec 2021 00:00:00 +0000</pubDate>
</item>
<item>
<title>title string here</title>
<link>link string with https style information</link>
<guid>link string with https style information</guid>
<pubDate>Mon, 06 Dec 2021 00:00:00 +0000</pubDate>
</item>
</channel>
</rss>
sed
can also be used if the variable is set.如果设置了变量,也可以使用
sed
。
$ date="$(date +"%a, %d %b %Y %T %z")"
$ sed "/pubDate/ s/....-..-../$date/" input_file
As commented, it is recommended to use xml parsing tool speaking generally.如评论,一般来说推荐使用xml解析工具。 If the xml file is well aligned as the provided example,
bash
or awk
may work under limited conditions.如果 xml 文件与提供的示例对齐良好,则
bash
或awk
可能在有限的条件下工作。
With bash:使用 bash:
#!/bin/bash
while IFS= read -r line; do
if [[ $line =~ (.*<pubDate>)([0-9]{4}-[0-9]{2}-[0-9]{2})(</pubDate>.*) ]]; then
datestr=$(date -d "${BASH_REMATCH[2]}" +"%a, %d %b %Y %T %z")
line="${BASH_REMATCH[1]}$datestr${BASH_REMATCH[3]}"
fi
echo "$line"
done < feed.xml
The condition $line =~ (.*<pubDate>)([0-9]{4}-[0-9]{2}-[0-9]{2})(</pubDate>.*)
matches the <pubdate>
line assiging bash variable ${BASH_REMATCH[@]
to the parenthesized substrings.条件
$line =~ (.*<pubDate>)([0-9]{4}-[0-9]{2}-[0-9]{2})(</pubDate>.*)
匹配<pubdate>
行将 bash 变量${BASH_REMATCH[@]
分配给带括号的子字符串。 Then line
is reconstructed using the reformatted date string.然后使用重新格式化的日期字符串重建
line
。
If gawk
which supports mktime()
and strftime()
functions, you can also say with gawk
:如果
gawk
支持mktime()
和strftime()
函数,你也可以用gawk
说:
awk '
{
if (match($0, /^(.*<pubDate>)([0-9]{4})-([0-9]{2})-([0-9]{2})(<\/pubDate>.*)/, a) ) {
ts = mktime(a[2] " " a[3] " " a[4] " 00 00 00") # timestamp since the epoch
datestr = strftime("%a, %d %b %Y %T %z", ts)
$0 = a[1] datestr a[5]
}
} 1' feed.xml
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.