使用 linux 命令/srcipting 在 HTML 的特定单词之后提取单词

Question

我有一个文件'tes.html'：

<html>
<head><title>Index of /Data/Movies/Hollywood/2016_2017/</title></head>
<body bgcolor="white">
<h1>Index of /Data/Movies/Hollywood/2016_2017/</h1><hr><pre><a href="../">../</a>
<a href="1%20Buck%20%282017%29/">1 Buck (2017)/</a>                                     25-Nov-2019 10:25       -
<a href="1%20Mile%20to%20You%20%282017%29/">1 Mile to You (2017)/</a>                              25-Nov-2019 10:26       -
<a href="1%20Night%20%282016%29/">1 Night (2016)/</a>                                    25-Nov-2019 10:27       -
</pre><hr></body>
</html>

我想获取“%29 / ">”之后的值到 output.txt 并提供一个 header 'title'，例如：

title
1 Buck (2017)/
1 Mile to You (2017)/
1 Night (2016)/

如何使用 linux 命令（如 awk、Z177544AA797AFEF7F324E 等）获取 output 文件，如上述文件。

我试过这段代码：

awk '{for (I=1;I<NF;I++) if ($I == "%29/">") print $(I+1)}' file

Answer 1

使用您显示的示例，请尝试以下操作。

awk 'BEGIN{print "title"} match($0,/%29\/">[^/]*/){print substr($0,RSTART+6,RLENGTH-5)}'  Input_file

说明：为上述代码添加详细说明。

awk '                                   ##Starting awk program from here.
BEGIN{print "title"}
match($0,/%29\/">[^/]*/){               ##Using match function to match regex %29\/"> till / here.
  print substr($0,RSTART+6,RLENGTH-5)   ##Printing sub string here.
}
'  Input_file                           ##Mentioning Input_file name here.

Answer 2

还可以使用awk将FS设置为'[><]'并打印$3 ：

awk -F'[><]' 'BEGIN{ print "title" } /%29/ {print $3}' file
title
1 Buck (2017)/
1 Mile to You (2017)/
1 Night (2016)/

或者这个使用$2的最后一个（你需要的条件）：

awk -F'[><]' 'BEGIN{ print "title" } $2 ~ /%29\/"$/ {print $3}' file
title
1 Buck (2017)/
1 Mile to You (2017)/
1 Night (2016)/

使用 linux 命令/srcipting 在 HTML 的特定单词之后提取单词

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-04-08 07:27:26

解决方案2
2 2021-04-08 09:12:43

使用 linux 命令/srcipting 在 HTML 的特定单词之后提取单词

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-04-08 07:27:26

解决方案2 2 2021-04-08 09:12:43

解决方案1
2 已采纳 2021-04-08 07:27:26

解决方案2
2 2021-04-08 09:12:43