修剪字符串直到Bash中的某些字符

Question

I'm trying to make a bash script that will tell me the latest stable version of the Linux kernel. 我正在尝试制作一个bash脚本，该脚本将告诉我Linux内核的最新稳定版本。

The problem is that, while I can remove everything after certain characters, I don't seem to be able to delete everything prior to certain characters. 问题是，尽管我可以删除某些字符之后的所有内容，但似乎无法删除某些字符之前的所有内容。

#!/bin/bash

wget=$(wget --output-document - --quiet www.kernel.org | \grep -A 1 "latest_link")

wget=${wget##.tar.xz\">}

wget=${wget%</a>}

echo "${wget}"

Somehow the output "ignores" the wget=${wget##.tar.xz\\">} line. 输出以某种方式“忽略” wget=${wget##.tar.xz\\">}行。

Answer 1

You're trying remove the longest match of the pattern .tar.xz\\"> from the beginning of the string, but your string doesn't start with .tar.xz , so there is no match. 您正在尝试从字符串的开头删除模式.tar.xz\\">的最长匹配项，但是您的字符串不是以.tar.xz ，因此没有匹配项。

You have to use 你必须用

wget=${wget##*.tar.xz\">}

Then, because you're in a script and not an interactive shell, there shouldn't be any need to escape \\grep (presumably to prevent usage of an alias), as aliases are disabled in non-interactive shells. 然后，由于您使用的是脚本而不是交互式外壳，因此无需使用\\grep （可能是为了防止使用别名），因为在非交互式外壳中禁用了别名。

And, as pointed out, naming a variable the same as an existing command (often found: test ) is bound to lead to confusion. 而且，正如所指出的那样，将变量命名为与现有命令相同（通常会找到： test ）势必会引起混乱。

If you want to use command line tools designed to deal with HTML, you could have a look at the W3C HTML-XML-utils (Ubuntu: apt install html-xml-utils ). 如果您想使用设计用于处理HTML的命令行工具，则可以看看W3C HTML-XML-utils （参考资料： apt install html-xml-utils ）。 Using them, you could get the info you want as follows: 使用它们，您可以获得所需的信息，如下所示：

$ curl -sL www.kernel.org | hxselect 'td#latest_link' | hxextract a -
4.10.8

Or, in detail: 或者，详细而言：

curl -sL www.kernel.org |     # Fetch page
hxselect 'td#latest_link' |   # Select td element with ID "latest_link"
hxextract a -                 # Extract link text ("-" for standard input)

Answer 2

Whenever I need to extract a substring in bash I always see if I can brute force it in a couple of cut(1) commands. 每当我需要在bash中提取一个子字符串时，我总是看到是否可以通过几个cut（1）命令强行使用它。 In your case, the following appears to work: 在您的情况下，以下内容似乎起作用：

wget=$(wget --output-document - --quiet www.kernel.org | \grep -A 1 "latest_link")
echo $wget | cut -d'>' -f3 | cut -d'<' -f1

I'm certain there's a more elegant way, but this has simple syntax that I never forget. 我敢肯定有一种更优雅的方法，但是它具有我永远不会忘记的简单语法。 Note that it will break if 'wget' gets extra ">" or "<" characters in the future. 请注意，如果将来'wget'获得额外的“>”或“ <”字符，它将中断。

Answer 3

It is not recommended to use shell tools grep, awk, sed etc to parse HTML files. 不建议使用外壳工具grep，awk，sed等解析HTML文件。

However if you want a quick one liner then this awk should do the job: 但是，如果您想要一个快速班轮，则该awk应该可以完成此工作：

get --output-document - --quiet www.kernel.org |
awk '/"latest_link"/ { getline; n=split($0, a, /[<>]/); print a[n-2] }'

4.10.8

Answer 4

sed method: sed方法：

wget --output-document - --quiet www.kernel.org | \
  sed -n '/latest_link/{n;s/^.*">//;s/<.*//p}'

Output: 输出：

4.10.8

修剪字符串直到Bash中的某些字符

问题描述

4 个解决方案

解决方案1
2 已采纳 2017-04-06 17:18:30

解决方案2
1 2017-04-06 17:19:03

解决方案3
0 2017-04-06 17:27:29

解决方案4
0 2017-04-06 17:34:52

修剪字符串直到Bash中的某些字符

问题描述

4 个解决方案

解决方案1 2 已采纳 2017-04-06 17:18:30

解决方案2 1 2017-04-06 17:19:03

解决方案3 0 2017-04-06 17:27:29

解决方案4 0 2017-04-06 17:34:52

解决方案1
2 已采纳 2017-04-06 17:18:30

解决方案2
1 2017-04-06 17:19:03

解决方案3
0 2017-04-06 17:27:29

解决方案4
0 2017-04-06 17:34:52